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ON HIDDEN VARIABLES: 

VALUE AND EXPECTATION NO-GO THEOREMS 


ANDREAS BLASS AND YURI GUREVICH 

Abstract. No-go theorems assert that hidden-variable theories, 
subject to appropriate hypotheses, cannot reproduce the predic¬ 
tions of quantum theory. We examine two species of such theo¬ 
rems, value no-go theorems and expectation no-go theorems. The 
former assert that hidden-variables cannot match the predictions 
of quantum theory about the possible values resulting from mea¬ 
surements; the latter assert that hidden-variables cannot match 
the predictions of quantum theory about the expectation values 
of measurements. We sharpen the known results of both species, 
which allows us to clarify the similarities and differences between 
the two species. We also repair some flaws in existing definitions 
and proofs. 


1. Introduction 

This paper is about “no-go” theorems asserting the impossibility of 
schemes for explaining the probabilistic aspects of quantum mechanics 
in terms of ordinary, classical probability. Such schemes are often called 
hidden-variable theories. They postulate that a quantum state, even if 
it is a pure state and thus contains as much information as quantum 
mechanics permits, actually describes an ensemble of systems with dif¬ 
ferent values for some additional, hidden properties that are not taken 
into account in quantum mechanics. The ensemble given by a quantum 
state is thus composed of sub-ensembles, each having specific values for 
the hidden variables. The idea is that, once the values of these hidden 
variables are specihed, all the properties of the system become deter¬ 
minate (or at least more determinate than quantum mechanics says). 
Thus the randomness in quantum predictions results (entirely or at 
least partially) from the randomness involved in choosing a particu¬ 
lar element, with particular values of the hidden variables, from the 
ensemble that a quantum state describes. 


Part of the first author’s work was done as a visiting researcher at Microsoft 
Research; another part was done as a visiting fellow at the Isaac Newton Institute 
for Mathematical Sciences. 
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No-go theorems for hidden-variable interpretations of quantum me¬ 
chanics assert that, under reasonable assumptions, a hidden-variable 
interpretation cannot reproduce the predictions of quantum mechan¬ 
ics. There are many no-go theorems in the literature. Although they all 
share the basic idea, “hidden-variable theories cannot succeed,” they 
differ from one another in the particular description of what a hidden- 
variable theory is and what is meant by succeeding. A typical no-go 
theorem can be formulated in terms of a hypothesis saying what a 
hidden-variable theory should look like and a conclusion saying that 
certain predictions of quantum mechanics can never result from such a 
theory. In this paper, we examine two species of such theorems, value 
no-go theorems and expectation no-go theorems. We sharpen the re¬ 
sults of both species, which allows us to clarify both the similarities 
and the differences between the two species. 

The value approach originated in the work of Bell IDE] and of Kochen 
and Specker m in the 1960’s. A very readable overview of this work, 
with some simplihcations and historical information, is given by Mer- 
min [ 12 ]. Value no-go theorems establish that, under suitable hypothe¬ 
ses, hidden-variable theories cannot reproduce the predictions of quan¬ 
tum mechanics concerning the possible results of measurements. There 
is no need to consider the probabilities of possible results or the ex¬ 
pectation values of measurements; the measured values alone provide 
a discrepancy between hidden-variable theories and quantum theory. 
The hypotheses that are used to deduce these theorems concern the 
measurements of observables in quantum states. 

The expectation approach was developed in the last decade by 
Spekkens (TB] and by Ferrie, Emerson, and Morris P IZllSI, with P 
giving the sharpest result. In this approach, the discrepancy between 
hidden-variable theories and quantum mechanics appears in the pre¬ 
dictions of the expected values of measurements. There is no need to 
consider the actual values obtained by measurements or the probability 
distributions over these values. The hypotheses that are used to de¬ 
duce these results concern the measurement of effects, i.e. the elements 
of positive operator-valued measurements (POVMs). Effects are repre¬ 
sented by Hermitian operators with spectrum on the real interval [0,1]. 
They are regarded as representing yes-or-no questions, the probability 
of “yes” for effect E in state I'll}) being ('0|E|'0). 

Although both approaches involve measurements associated to Her¬ 
mitian operators, they are different sorts of measurements. In the 
value approach, Hermitian operators serve as observables, and measur¬ 
ing one of them produces a number in its spectrum. In the expectation 
approach, certain Hermitian operators serve as effects, and measuring 
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one of them produces “yes” or “no”, i.e., 1 or 0, even if the spectrum 
contains — or even consists entirely of — other points. The only Her- 
mitian operators for which these two uses coincide are the projections, 
the operators whose spectrum is included in {0,1}. We sharpen the 
results of both approaches so that only projection measurements are 
used. 

The present work started with repairing various flaws in the liter¬ 
ature on expectation no-go theorems. Although the papers purport 
to specify the exact assumptions needed to obtain their no-go results, 
some of them bring in, afterward, an additional assumption of convex- 
linearity; another erroneously claims that this assumption follows from 
the others. In addition, the assumptions are sometimes ambiguous, 
and one of the papers relies on an erroneous result of Bugajski [1], 
which needs some additional hypotheses to become correct. We ex¬ 
plain the flaws that we found and how to circumvent them, and we 
strengthen the Ferrie, Morris, and Emerson result in [8] by substan¬ 
tially weakening the hypotheses. We do not need arbitrary effects, or 
even arbitrary sharp effects, but only rank-1 projections. Accordingly, 
we need convex-linearity only for the hidden-variable picture of states, 
not for that of effects. 

Theorem 1. For no quantum system are there a measurable space A, 
a convex-linear map T from density matrices p to probability measures 
on X, and a map S from rank-1 projections E in the Hilbert space of the 
system into measurable functions from A to [0,1], such that Tr{pE) = 
Jj^S{E) dT{p) for all p and E. 

Some of the literature on expectation no-go theorems emphasizes a 
symmetry between states and effects. We explain why such symme¬ 
try is to be expected only when the Hilbert space of states is hnite- 
dimensional and the space of possible values for the hidden-variables is 
not merely hnite-dimensional but hnite. 

We formulate the value no-go theorems in terms of the maps, from 
observables to real numbers, that a hidden-variable theory would as¬ 
sign to individual systems. We dehne a value map v for a set O of 
Hermitian operators on Hilbert space H to he & function that assigns 
to each operator A E O & number v{A) in the spectrum of A in such 
a way that, for any pairwise commuting operators Ai,...,A„ G O, 
the tuple (t(Ai), ..., f (A„)) belongs to the joint spectrum of the tuple 
(Ai,..., An). (The notion of joint spectra is uncommon in the quan¬ 
tum literature, so we explain it and its relevant properties.) Our value 
no-go theorem is close to one of Bell’s results, as interpreted by Mermin 
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Theorem 2. Suppose that T-L is a Hilbert space of dimension > 3. 

(1) There is a finite set of projections for which no value map exists. 

(2) If dim('H) < oo then there is a finite set of rank-1 projections 
for which no value map exists. 

The desired finite sets of projections are constructed explicitly in 
the proof. The condition dini('H) > 3 is necessary. In the case of 
dim('H) = 2 there are counterexamples [21112] that produce not only 
correct values but also correct probabilities for pure states; we slightly 
simplify the verification of that. These counterexamples do not violate 
Theorem [T] merely because they apply only to pure states and do not 
admit convex linearity. The condition dim(?f) < cx) in (2) is also 
necessary. If dim('H) = oo then the zero function is a value map for 
the set of all finite-rank projections. 

Note that there is no implication in either direction between our two 
theorems. One says that a hidden-variable theory cannot predict the 
correct values for measured quantities (though it might predict correct 
expectations) while the other says that a hidden-variable theory cannot 
predict the correct expectations (though it might predict the correct 
values, with incorrect probabilities). Thus, there are two separate rea¬ 
sons why hidden theories must fail. 

We postpone to future work a similar study of no-go theorems for 
local hidden-variable theories. In these theories, a certain amount of 
contextuality is allowed, which means that the measured value of an 
observable can depend on which other, commuting observables are mea¬ 
sured along with it, but only if those other observables are local in a 
suitable sense. There are value no-go theorems for such theories mm, 
but they rely on a stronger notion of value map. Consider, for example, 
two observables that do not commute and therefore cannot, in general, 
be simultaneously measured. They might nevertheless share a common 
eigenvector Ifi) and would then have simultaneous definite values when 
the state of the system is Ifi). In this case, a hidden-variable theory 
should provide a value map that assigns appropriately correlated values 
for these two observables. 

This paper is organized as follows. In Section [21 we describe in 
detail the ingredients of various hidden-variable theories. Section [3] is 
devoted to expectation no-go theorems. We begin by describing the 
work of Spekkens [16] and of Ferrie, Emerson, and Morris mmn, 
pointing out the flaws that we found and suggesting how to circumvent 
them. At the end of the section, we prove our expectation no-go result, 
TheoremlH which strengthens the result of Ferrie, Morris, and Emerson 
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in [8] . Section 0] is devoted to value no-go theorems, ending with the 
proof of Theorem [21 

Section |5] is devoted to giving a mathematical basis for the intuitive 
idea that a hidden-variable theory for one Hilbert space should spe¬ 
cialize to a hidden-variable theory for any closed subspace, because the 
latter space just represents a subset of the states of the former. Thus a 
no-go theorem for the subspace should imply a no-go theorem for the 
larger space. We prove theorems that support this intuition in several 
cases. 

Section 0] examines an example, due to Bell [2] and described by 
Mermin [T2], of a hidden-variable theory for pure states in the case of 
a two-dimensional Hilbert space. The example shows that the assump¬ 
tion of dimension at least 3 cannot be omitted from Theorem [21 Our 
expectation no-go result. Theorem [H applies in all dimensions from 2 
up, and we point out why the example does not contradict the theorem. 

The paper has two appendices. The first discusses the notion of 
convex-linearity, which played a role in some of the flaws we found in 
the literature. The second presents a no-go theorem adapted to the 
original framework described by Spekkens (TB], minimally modified to 
remove unintended aspects and ambiguities. 

2. Hidden-Variable Theories 

In this section, we describe some of the differences between various 
approaches to hidden variables. These differences include what sorts of 
quantum states are considered, what sorts of measurements are consid¬ 
ered, and which predictions of quantum mechanics should be matched 
by the hidden-variable theory. 

2.1. States. Most hidden-variable theories begin with states in the 
usual sense of quantum mechanics and seek to make their properties 
more determinate by adjoining hidden variables. In some cases, how¬ 
ever, they begin with a more primitive notion, that of a preparation^ a 
way of producing systems in a specific quantum state. Different prepa¬ 
rations might produce the same state. In [TB], Spekkens works with a 
notion of ontological model of quantum theory, in which distribution 
functions (describing how a quantum ensemble is composed of more de¬ 
terminate sub-ensembles) are assigned to preparation procedures. He 
gives the name “preparation noncontextuality” to the hypothesis that 
different preparations of the same quantum state yield the same dis¬ 
tribution function, i.e., that the distribution function is determined by 
the quantum state. This hypothesis is in force for most of [16], but 
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it is pointed out explicitly as a hypothesis that could, in principle, be 
questioned. 

Other hidden-variable theories begin with quantum states rather 
than with preparations, so that preparation noncontextuality is built 
into the foundational framework of these theories. They seek to an¬ 
alyze quantum states as ensembles obtained by mixing sub-ensembles 
with more determinate properties. These sub-ensembles are viewed in 
different ways by the various theories, but these viewpoints are ulti¬ 
mately equivalent. For example, Mermin [12] talks about individual 
systems in the quantum ensemble while von Neumann [TS] talks about 
dispersion-free sub-ensembles. Other authors HHEIEIIS], do not refer 
to the sub-ensembles explicitly but work with distribution functions 
over a space whose points are best viewed as parametrizing such sub¬ 
ensembles. 

Even after one decides to work with quantum states, one still has a 
choice whether to work only with pure states or to admit mixed states 
as well. At hrst sight, the difference between these two options might 
seem unimportant. After all, any mixed state is a weighted average of 
pure states. So, given interpretations of pure states as ensembles, we 
can use weighted mixtures of these ensembles to represent mixed states. 
The situation is, however, more subtle. A single mixed state may be 
represented as a weighted average of pure states in more than one way. 
Can the associated weighted averages of ensembles depend on which 
of these representations we use? In general, the answer is yes, and 
then we do not obtain a single, well-dehned ensemble to represent this 
mixed state. Well-dehnedness of the ensemble representations of mixed 
states is not automatic but rather imposes a non-trivial consistency 
requirement on the representations of the pure states. In Section El we 
shall describe an example, essentially due to Bell, of a hidden-variable 
representation of pure states (for a 2-dimensional Hilbert space) that 
cannot be extended to mixed states while respecting weighted averages. 

To summarize this situation, we list four approaches to the issue 
of what states (or preparations) should be given a hidden-variable in¬ 
terpretation. (We use “mixed” here to mean “possibly mixed”; pure 
states are included among the mixed ones.) 

(1) Pure states, with no consistency requirement on the represen¬ 
tation. 

(2) Pure states, subject to the consistency requirement allowing a 
well-dehned extension to mixed states, by respecting weighted 
averages. 

(3) Mixed states, with no consistency requirement. 
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(4) Mixed states, subject to the requirement of respecting weighted 
averages. 

In items 2 and 4, “respecting weighted averages” means that the collec¬ 
tion of sub-ensembles associated to a weighted mixture of some given 
states is the corresponding weighted average of the sub-ensembles for 
the given states. In item 2, respect for weighted averages serves as a 
method for extending the hidden-variable interpretation from pure to 
mixed states. In item 4, respect for weighted averages is a requirement 
imposed on the assumed interpretation of mixed states. These two 
items are equivalent, in the sense that the mixed-state interpretations 
considered in item 4 are exactly the (unique) extensions to mixed states 
of the pure-state interpretations considered in item 2. 

The other two items in the list, items 1 and 3, are more liberal 
because they do not require any respect for weighted averages. 

The preceding list of four (or three in view of the equivalence between 
items 2 and 4) approaches could be doubled by including analogous 
versions with preparations in place of states. 

Notation 3. The concept of respecting weighted averages has several 
names in the literature. The formal definition of the concept, namely 
that the function / in question satisfies f{ax + by) = af{x) + bf{y) 
whenever a and b are nonnegative real numbers with sum 1, looks like 
the dehnition of linearity except that it applies only to the restricted 
options for a and b that produce weighted averages, also known as con¬ 
vex combinations. Because of this, some authors, for example Spekkens 
[IS], use the term convex linear, and we shall follow this terminology. 
Other authors (see, for example, prefer the shorter name affine, 

though this would seem more natural for the related concept where a 
or b can be negative and the only constraint on them is a -|- 6 = 1. In 
Appendix we look more closely at the notion of convex-linearity. 

2.2. Measurements. We consider next the sorts of measurements 
that a hidden-variable theory should explain. In quantum mechanics, 
measurements are ordinarily represented by certain Hermitian opera¬ 
tors on the Hilbert space of states of a system. In this context, those 
operators are usually called observables. 

Before turning to the question of which operators should be treated 
in a hidden-variable theory, we hrst address a prior issue, analogous to 
the issue of state versus preparation in the previous subsection. The 
analogous issue here is measurement versus apparatus. It is entirely 
possible that different experimental arrangements measure the same 
observable. In such a situation, those arrangements should produce 
the same results (the same statistical distribution of measured values) 
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for any particular quantum state, but it is not clear that they should 
produce the same results on each of the sub-ensembles considered in a 
hidden-variable theory. Spekkens’s ontological models [16] assign mea¬ 
surement values not to observables but to measurement procedures. 
He introduces the name “measurement noncontextuality” for the hy¬ 
pothesis that different measurement procedures for the same observable 
result in the same outcomes. (Actually, he deals only with measure¬ 
ments of effects; see below.) 

When hidden-variable theories take observables to be the entities 
to be measured in their sub-ensembles, either because of an explicit 
assumption of measurement noncontextuality or because observables 
are built into the foundation of the theory, there still remains a choice 
as to which observables are to be considered and what is meant by 
measuring them. 

A traditional viewpoint is that observables are arbitrarjUHermitian 
operators and that a measurement of such an operator in some state 
produces a real number in the spectrum of the operator. For simplic¬ 
ity, we shall pretend for a while that our Hilbert spaces are finite¬ 
dimensional, so that a measurement produces an eigenvalue of the 
operator. We shall see in Section [S] that no-go theorems for finite¬ 
dimensional Hilbert spaces typically imply the corresponding theorems 
for infinite-dimensional spaces, so in these cases our simplifying as¬ 
sumption does not really lose generality. Quantum mechanics gives 
well-known formulas for the probabilities of the various eigenvalues 
and therefore also for quantities like the expectation of the measured 
values. 

For a hidden-variable theory to successfully match the predictions 
of quantum mechanics, one would reasonably require it to predict, in 
particular, the possible values of any measurement (namely the eigen¬ 
values of the observable being measured) and their respective proba¬ 
bilities. It turns out, somewhat surprisingly, that several no-go the¬ 
orems work under considerably weaker demands on what the hidden- 
variable theory must accomplish. Specifically, some theorems show 
that a hidden-variable theory cannot even predict the correct values 
for all observables, even if one doesn’t care about probabilities or even 
the expectation values. Other theorems show that a hidden-variable 
theory cannot even predict the correct expectations, even if one doesn’t 
care about the particular values or probabilities. For brevity, we shall 


^We ignore here the complications arising from superselection rules, which make 
some Hermitian operators unobservable. 
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refer to theorems of these two sorts as “value no-go” and “expectation 
no-go” theorems, respectively. 

Another common view of measurements in quantum mechanics is 
based not on observables but on particular Hermitian operators called 
effects and on certain sets of effects called positive operator-valued mea¬ 
sures (POVMs). An effect is a Hermitian operator E whose spectrum 
lies in the interval [0,1] of the real line. Among the effects are the 
sharp effects, those whose spectrum is included in the two-element set 
{0,1}. The sharp effects are simply the projection operators from the 
Hilbert space onto its closed subspaces. Arbitrary effects are weighted 
averages of sharp ones. A POVM is a set of effects whose sum is the 
identity operator I. Notice that every effect is a member of at least 
one POVM, namely {E, I — E}] unless E = I, it is also a member of 
numerous other POVMs. 

A POVM {Ek : k G S}, where S is some index set (usually hnite), 
is intended to model a measurement whose outcome is a member of 
S, the probability of outcome k for state |'0) being {'iflEklfi), or, in 
the case of mixed states with density matrix p, Tr(i?fcp). Measurement 
of an observable A amounts to measurement of the POVM consist¬ 
ing of the projections to the eigenspaces of A. Arbitrary POVMs are 
more general in two respects, first that the effects in a POVM need 
not be sharp, and second that these effects need not commute with one 
another. Despite the additional generality, it is known that general 
POVM measurements can be reduced to measurements of observables 
in a larger Hilbert space, one in which the original Hilbert space is 
isometrically embedded. For details, see for example [201 Section 5.1] 
or [10]. Because actually measuring a general POVM can be a compli¬ 
cated process, involving an enlargement of the original Hilbert space, 
it is not clear that POVMs are so fundamental that a hidden-variable 
theory should be required to produce correct predictions for them. In 
particular, it is not clear that enlargement of the Hilbert space makes 
sense for the sub-ensembles considered by such theories. It is therefore 
preferable for no-go theorems to apply even when the hidden-variable 
theory is required to work correctly only for those POVMs whose mea¬ 
surement does not require enlarging the Hilbert space. Such POVMs 
include, in particular, those consisting of mutually commuting, sharp 
effects. 

It makes sense to speak of measuring a single effect E] this means 
measuring the POVM {E,I — E}. In other words, it is a yes-or-no 
measurement, with “yes” corresponding to E and “no” to I — E. The 
probability of the answer “yes” when effect E is measured in a pure 


10 


ANDREAS BLASS AND YURI GUREVICH 


state IV') is for a mixed state with density matrix p, it is the 

trace Ti{Ep). 

When a hidden-variable theory uses POVMs and effects as the mea¬ 
surements for which values are predicted, we encounter a third notion of 
noncontextuality, in addition to the preparation noncontextuality and 
measurement noncontextuality mentioned above. The question here is 
whether the measurement of an effect E depends only on E itself or on 
the entire POVM of which T' is a member. For a quantum state, the 
probability of getting “yes” when measuring E depends only on E, but 
that does not necessarily imply that the same situation obtains for all 
the sub-ensembles within that state. The assertion that, even for the 
sub-ensembles, it is only E that matters, not the whole POVM, is the 
third sort of noncontextuality. 

This issue also arises, as made very clear in [12], when measurements 
are given by observables rather than effects and POVMs. Noncontextu¬ 
ality in this context means that the result of measuring an observable A 
does not depend on what other observables might be measured along 
with A. (In this framework, those other observables must commute 
with A and with each other, for otherwise they could not be measured 
simultaneously. The framework does not envision enlarging the Hilbert 
space.) 

We shall use the word determinate to refer to all sorts of noncon¬ 
textuality. The intended meaning is that a hidden-variable theory’s 
analysis of some aspect of quantum theory — such as states or observ¬ 
ables or effects — should be completely determined by what is explicitly 
mentioned, regardless of other aspects of the situation — preparations 
or apparatuses or other simultaneous measurements. 

Remark 4. Before leaving the discussion of measurements, we point out, 
to avoid possible confusion, that, although an effect E is, in particular, 
a Hermitian operator and thus an observable, measuring it as an effect 
is quite different from measuring it as an observable. According to 
quantum theory, the former always produces 1 (“yes”) or 0 (“no”); the 
latter always produces one of the eigenvalues of E. The two sorts of 
measurement coincide only when i? is a sharp effect. 

3. Expectation No-Go Theorems 

In this section, we discuss, clarify, and extend the work of Spekkens 
[16] and of Ferrie et ah [aiiii , which yields what we called expec¬ 
tation no-go theorems above. That is, under suitable hypotheses, it 
is shown that hidden-variable theories cannot correctly predict the ex¬ 
pectation values of effects. To describe and clarify the contents of these 
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papers, we begin with the earliest of them, [I6], comment on various 
aspects in need of clarihcation, and then indicate how these aspects are 
developed in the three papers of Ferrie et al. 

The papers under discussion differ somewhat in the hypotheses that 
they explicitly assume, and they also differ in their names for the the¬ 
ories that satisfy their hypotheses. We shall use the generic name 
“probability representations” for these theories. In the rest of this sec¬ 
tion, we shall describe in considerable detail the variations in content 
of these theories; see Remark [B] for variations in the terminology. 

3.1. Spekkens’s no-go theorem. The following dehnition is essen¬ 
tially from [16], but see the commentary following the dehnition for 
more details. 

Definition 5. Given a Hilbert space a probability representation 
(Spekkens version) for quantum systems described by "H consists of 

• a measure space A, 

• for every density operator p on "H, a nonnegative real-valued 
measurable function pp on A, normalized so that Pp(A) dA = 
1 , 

• for every POVM {Ek}, a set of nonnegative real-valued 

measurable functions on A that sum to the unit function on A, 
subject to the requirement that, if = 0, then the associated 
function is identically zero, 

such that for all density operators p and all POVM elements Ek, we 
have Tr(pEfc) = dA Pp(A)^Efc(A). 

The intention behind this dehnition is that each point A G A rep¬ 
resents a particular sub-ensemble as provided by the hidden-variable 
theory. A quantum state p represents an ensemble composed of vari¬ 
ous of these sub-ensembles, mixed together according to the probability 
measure Pp(A) dA. When an ehect E is measured on a system from the 
sub-ensemble A, the probability of getting a “yes” answer is ^e(A). 
Note that even a sharp ehect can have, in a sub-ensemble A, a non¬ 
trivial probability of producing “yes”; the probability need not be 0 or 
1. This is discussed in detail in the early part of [IB] . 

Note that the last part of Dehnition [B] requires the expectation value 
for an ehect Em. a. state p, as computed by quantum mechanics, namely 
Tr(p£^), to agree with the prediction of the hidden-variable theory, the 
weighted average of the probabilities ^e(A) weighted according to the 
composition pp of the state p. This is the only agreement demanded 
here between quantum mechanics and a hidden-variable theory; that 
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is why we refer to the resulting no-go theorem as an expectation no-go 
theorem. 

We have deviated here in several ways from Spekkens’s formulation 
in [16], and we pause to explain the deviations. First, while giving the 
dehnition, Spekkens explains “density operator” as “a positive trace- 
class operator”. We take “density operator” in its usual meaning, which 
requires that the trace of p is 1. We assume that this is what Spekkens 
intended, both because of the terminology and because of the required 
normalization of fip. If p were an arbitrary trace-class operator, then 
we would expect Pp(A) dX to equal the trace of p rather than 1. 

Second, Spekkens refers to A as a measurable space rather than a 
measure space. The difference is that a measurable space consists just 
of a set A and a a-algebra of subsets called the measurable sets; a mea¬ 
sure space has, in addition, a specihc measure dehned on this a-algebra. 
The integrals in the dehnition, both in the normalization condition for 
Pp and in the equation at the end of the dehnition, presuppose the 
availability of a hxed measure denoted by dX. So we assume that 
Spekkens intended A to be a measure space, and we have formulated 
our dehnition accordingly. 

Third, we have required the functions pp and to be measurable. 
This requirement is needed in order for the integrals in the dehnition 
to make sense. 

Because the probability densities associated to states (density opera¬ 
tors) p are given by functions Pp, they are, when considered as measures 
on A, always absolutely continuous with respect to the hxed measure 
dX. This aspect of the dehnition does not seem well motivated. It 
remains in force in [6] and [7|, but in the more recent paper [S] it is 
replaced by a broader viewpoint, taking A to be a measurable space 
(not a measure space, i.e., no hxed measure) and representing states p 
by measures rather than by functions. 

Remark 6. We already mentioned the ontological models from [T6] : 
these assign density functions p and outcome functions ^ to prepara¬ 
tions and measurement procedures, respectively, rather than to states 
p and ehects E. The hypotheses for probability representations that we 
gave above are what one obtains by adding to the notion of ontological 
models the additional hypotheses of preparation noncontextuality and 
measurement noncontextuality. In the same paper [T6|, Spekkens intro¬ 
duces a notion of “quasiprobability representation”, which requires the 
functions pp and to be determined independently of the preparation 
of p and the apparatus measuring E, but which allows these functions 
to take negative values. Thus, our notion of probability representation 
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can be obtained by adjoining, to the notion of qnasiprobability repre¬ 
sentation, the additional hypothesis of nonnegativity. In other words, 
the three notions of nonnegative qnasiprobability representation, non- 
contextual ontological model (both from [H]), and probability repre¬ 
sentation (in our terminology) coincide. The coincidence of the hrst 
two of these accounts for the title of [I6] . 

In formulating Dehnition El we have retained one ambiguity from 
[TB] . namely the third form of noncontextuality, mentioned in Section [21 
Does depend only on E or also on the POVM that E is a. member 
of? The notation ^e, which mentions only E and not a whole POVM, 
suggests the former, but the wording of the relevant clause in the def¬ 
inition of “qnasiprobability representation” in [16], “every POVM ... 
is represented by a set ... of real-valued functions ...,” suggests the 
latter. We adopt the former interpretation, that ^e is determined by 
E, for two reasons. First, the formulation of measurement noncontex- 
tuality in [TB] supports this interpretation. Second, this interpretation 
seems to be essential for the proof of the no-go theorem in CHI- 

To complete our discussion of the hypotheses in [TB], one more as¬ 
sumption needs to be discussed, namely convex-linearity. This assump¬ 
tion is not present in the dehnitions of “qnasiprobability representa¬ 
tion” and “ontological model” nor in the additional assumptions of 
nonnegativity and noncontextuality. It is, however, explicitly asserted 
both for density matrices and for effects as if it were a necessary prop¬ 
erty of such models. Specihcally, equations (7) and (8) of [TB] say that, 
for probability distributions {wj}, 

if p = ^WjPj, then pp{\) = '^WjPp^{\) 

j 3 

and 

if E = ^WjEj, then ^e(A) = 

j j 

Spekkens gives a quite plausible argument for the hrst of these equa¬ 
tions, namely that an ensemble represented by the convex combination 
p can be prepared by hrst choosing a value of j at random, with each 
j having probability Wj, and then preparing the correponding state pj. 
The corresponding sub-ensembles should then be given by the corre¬ 
sponding weighted mixtures of the sub-ensembles of the p/s. 

The plausibility of convex-linearity might be reduced if one consid¬ 
ers the fact that the same p can result from such a mixture in many 
ways, so convex-linearity imposes some highly nontrivial constraints on 
the p functions. Any uneasiness resulting from this consideration can. 
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however, be ascribed to the assumption of preparation noncontextu- 
ality rather than to convex-linearity. The uneasiness results from the 
requirement that all the many ways to prepare a p ensemble must yield 
the same mixture of sub-ensembles. 

Despite the plausibility of convex-linearity, it does not follow from 
just the dehnitions in [16] or from our version, Dehnition [5] above. 
To see this, suppose that the functions do not span the whole 
space of square-integrable functions on A, so that there is a func¬ 
tion cr orthogonal to all of these ^e’s, where “orthogonal” means that 
/ dX = 0. One could then modify the Pp functions by adding 

to each one some multiple of a, obtaining p'p = pp + Cpa and still satis¬ 
fying the dehnitions. Here the coefficients Cp can be chosen arbitrarily 
for all of the density operators p. By choosing them in a sufficiently 
incoherent way, one could arrange that p'p{X) ^ J2j'^jf^pjiX). 

If, on the other hand, the ^e^s do span the whole space of functions on 
A, then Spekkens’s desired equation Pp{X) = '^jWjPp.{\) does follow, 
for all but a measure-zero set of A’s, because the two sides of this 
equation must give the same result when integrated against any ^e- 

Unfortunately, nothing in the dehnitions requires the ^£;’s to span 
the whole space. For example, given any probability representation, 
we can obtain another, physically equivalent one as follows. Replace A 
by the disjoint union Ai U A 2 of two copies of A. Dehne the measure of 
any subset of Ai U A 2 to be the average of the original measures of its 
intersections with the two copies of A. Dehne all the functions Pp and 
^E on the new space by simply copying the original values on both of 
the Aj’s. The result is a probability representation in which the ^e^s 
span only the space of functions that are the same on the two copies 
of A. 

Convex-linearity plays an important role in Spekkens’s proof of the 
no-go theorem in [12], so, in order to support this proof, it should 
be added either as a requirement in the dehnition of the probability 
representations under consideration or as a hypothesis in the no-go 
theorem. 

Convex-linearity leads to another problem in [16]. Spekkens asserts 
that, if a function / is convex-linear on a convex set S of operators that 
span the space of Hermitian operators (and / takes the value zero on the 
zero operator if the latter is in S), then / can be uniquely extended to 
a linear function on this space. Unfortunately, such a linear extension 
need not exist in the general case, when zero is not in sE For a simple 


^Spekkens gives a formula purporting to define a linear extension of / in general, 
but it is not well-defined because it involves some arbitrary choices. He also gives, 
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example, consider the function that is identically 1 on an iS that spans 
the space of Hermitian operators, does not contain 0, but does contain 
two orthogonal projections and their sum. Because of this difficulty, 
we give, in Appendix a careful discussion of convex-linearity and its 
relation to linearity. 

The no-go theorem in [16] says that, when the Hilbert space "H 
has dimension at least 2, there cannot be a probability representation 
(Spekkens version), subject to the clarihcations above, and satisfying 
the additional hypothesis of convex-linearity both for states and for 
effects. We give a careful proof of this theorem in Appendix [Bl 

3.2. Ferrie and Emerson’s no-go theorems. We turn next to a 
discussion of the papers ilEI of Ferrie and Emerson. These papers use 
the notion of frames in Hilbert spaces, a generalization of the notion of 
basis. We did not hnd frames useful, so we describe the relevant part 
of these papers in a way that minimizes reference to frames. 

In both [6] and [7|, a quasiprobability representation of quantum 
stated is dehned as a linear and invertible map T from the smce of 
Hermitian operators on l-L to T^(A, p). Here A is a measure spacqj, with 
measure p, and L^(A,p) is the space of real-valued, square-integrable 
functions on it, modulo equality /i-almost everywhere. Note that both 
L^(A,/i) and the space of Hermitian operators on "H are real Hilbert 
spaces, the latter having the inner product dehned by Tr(Ai?). As far 
as we can see, the motivation for using L^(A,/i) rather than L^(A,/i) 
comes neither from intuition nor from physics but rather from the 
mathematical benehts of having a Hilbert space and from the authors’ 
desire to use the frame formalism. 

The intuition behind a quasiprobability representation in this sense 
is that each A G A represents an assignment of possible values to the 
hidden variables, or equivalently it represents one of the sub-ensembles 
provided by the hidden-variable theory. For a density operator p, the 
function T{p) is the probability distribution on sub-ensembles in the 
ensemble described by p. 

in footnote 18 of the newer version m of his paper, an argument purporting to 
show that his formula is independent of those choices, but that argument fails. 
It involves dividing by an appropriate constant C to turn two nonnegative linear 
combinations, the two sides of an equation, into convex combinations so that the 
assumption of convex-linearity can be applied. But the necessary divisor C may 
need to be different for the two sides of the equation. 

^Not of quantum mechanics but merely of quantum states. A representation of 
quantum mechanics would also include an interpretation for measurements. 

^We use the notation A for consistency with [16] and [7]. The corresponding 
space is called T in [6] and in [8]. 
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We believe that, when requiring T to be invertible, the authors of [HI 
[7| meant only to require that it be one-to-one, not that it be surjective 
as the usual meaning of “invertible” would imply. In other words, 
“invertible” was intended to mean merely “invertible on the range of 

Comparing the work in these papers with our commentary on [16] 
above, we note that in both [6] and [7] , the dehnition of “density oper¬ 
ator” includes, as we expected, the requirement that the trace be 1; the 
space A is explicitly equipped with a hxed measure p (corresponding 
to the implicit d\ in [H]); and the functions representing states and 
effects are required to be measurable. Because states are represented 
by functions in the presence of the hxed measure /i, the probability 
distributions of the sub-ensembles within a quantum state’s ensemble 
are always absolutely continuous with respect to /i, just as in [T6] . 

Concerning the question whether an effect E completely determines 
the function or whether ^e can depend also on the POVM in which E 
occurs, [H] contains the same ambiguity as [IH], but [7] unambiguously 
requires determinateness here: ^e depends only on E. 

Concerning convex-linearity, the situation in these papers PE] is 
rather complicated. As already indicated, the dehnition of a quasiprob¬ 
ability representation of states in these papers explicitly requires lin¬ 
earity. For the broader notion of a quasiprobability representation of 
quantum mechanics (incorporating not just states but ehects), the dis¬ 
cussion in [HI Section 3.2] begins in the context of frame representations, 
which are necessarily linear. But it continues with what the authors 
call a reformulation of the axioms of quantum mechanics, and this 
reformulation does not mention convex-linearity. Indeed, the axioms 
listed there are very similar to those of Spekkens [IH] that we put into 
Dehnition [5l Just as in our discussion of [IH], the axioms do not imply 
convex-linearity. 

In [7] Section IV.B], we hnd a notion of “frame representation of 
quantum theory” that implies linearity. Later, in Sections V.A and 
V.B, there are notions of “classical representation of quantum theory” 
and of “quasi-probability representation of quantum theory,” neither 
of which mentions or implies convex-linearity. Lemma 2 in Section V.B 
asserts that the mappings in a quasi-probability representation of quan¬ 
tum theory are affine, but this lemma is incorrect. (The error in the 
proof is the assumption, in the last displayed implication, that a convex 
combination + (1 — p)hfT 2 of p-functions representing states 
is again such a /i-function, so that the preceding displayed implication 
can be applied to it.) 
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3.3. Ferrie, Morris, and Emerson’s no-go theorem. The difficul¬ 
ties in [ini El E] that we have pointed out here, are resolved in [5]. In 
the abstract and introduction of [8], the authors describe their contri¬ 
bution as being primarily the extension of the earlier results in 
from finite-dimensional Hilbert spaces to infinite-dimensional ones. In 
view of results to be presented in Section El below, showing that in 
many situations no-go theorems for one Hilbert space automatically 
extend to similar theorems for any larger Hilbert spaces, we regard 
this extension as less important than the contribution in [S] of giving 
precise formulations that correct the deficiencies of prior work. 

The properties required of hidden-variable theories in [S] constitute 
Dehnition [10] below, but before formulating this definition we need to 
introduce notations for the spaces and subsets involved, and we need 
to point out some relationships between these spaces. 


Notation 7. • In the following, let A be a measurable space. Re¬ 

call that this means that the set A is equipped with a specihed 
(j-algebra E of subsets. 

• J^(A, S), often abbreviated to simply J^, is the space of 
bounded, measurable, real-valued functions on A. It is a vector 
space over the real numbers, and we equip it with the supremum 
norm, ||/|| = sup{|/(A)| : A G A}. 

• J^[o,i](A, S) or simply is the subset of T consisting of those 
functions whose values he in the interval [0,1]. 

• Ad(A, S), often abbreviated to simply Ad, is the space of 
bounded, signed, real-valued measures on A. It is a vector 
space over the real numbers, and we equip it with the total 
variation norm. That is, if /i G Ad, then /i can be expressed as 
/i+ — /i_, where /i+ and /i_ are positive measures with disjoint 
supports (called the positive and negative parts of /i). Then 
ll/^ll = /^+(A) +/i-(A). 

• Ad+i(A, S) or simply Ad+i is the subset of Ad consisting of 
the probability measures, i.e., the positive measures with total 
measure equal to 1. 

• l-i \s a. complex Hilbert space. 

• B(l-L), often abbreviated to simply B, is the real Banach space 
of bounded, self-adjoint operators Ti ^ Ti; its norm is the 
operator norm ||A|| = sup{||Ax|| : x G A, ||x|| = 1}. 

• H[o,i]('H) or simply H[oj] is the subset of B consisting of the 
effects, i.e., operators A E B such that both A and I — A are 
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positiv^, or equivalently such that the spectrum of A lies in 
the interval [0,1]. 

• Tin), often abbreviated simply T, is the vector subspace of B 
consisting of the (self-adjoint) trace-class operators. These are 
the operators A whose spectrum consists only of (real) eigen¬ 
values Q!j (eigenvalues with multiplicity > 1 are repeated in this 
list; the continuous spectrum is empty or {0}) such that the 
sum \ai\ is hnite; this sum serves as the norm HAH of A in 
T. (Note that this norm is usually not equal to the operator 
norm, the norm of A in B, which equals the supremum of the 

• 7+1 (?f) or simply 7+i is the subset of T consisting of the density 
operators, positive operators of trace 1. 

Remark 8. We have modihed some of the notations from [8]. In the 
hrst place, we have removed a subscript s from T and B. The sub¬ 
script’s purpose was to indicate that these spaces consist only of self- 
adjoint operators. Since we do not deal with more general operators 
in this context, the subscript seemed superfluous. Also, what we have 
called -T[o,i], Ad+i, -Bp,!], and 7+i have in [8] the notations £^(A, S), 
iS(A, S), i^(7f), and iS(7f), respectively. The double use of 8 and S 
served the useful purpose of indicating which ingredients of quantum 
theory correspond to which ingredients of a hidden-variable theory, but 
they also prevented any abbreviations omitting (A, S) or Ri. We hope 
that our notations will be easier to remember, since the main symbols 
B, T) indicate the vector spaces in which these subsets lie, while 
the subscripts hint at the restriction that characterizes elements of the 
subset. 

In contrast to [T6l El E] there is no specihed measure on A. As in 
these papers discussed earlier, a point A G A represents specihc values 
for all the hidden variables, and thus represents a specihc sub-ensemble 
for the hidden-variable theory. A quantum state will then be viewed as 
a mixture of such sub-ensembles according to a probability measure on 
A, i.e., an element of Ad+i. This approach avoids any assumption of 
absolute continuity of these measures with respect to an a priori given 
measure; there simply is no a priori given measure. 


is the identity operator. Positivity of a self-adjoint operator A means that all 
of its spectrum lies in the non-negative half of the real line. Equivalently, it means 
that ('0|d.|'0) > 0 for all jb) G R 
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The normed vector spaces introduced above are connected by two 
duality relations. First, every f E induces a continuous linear func¬ 
tional / on tVI by integration: 

/(h) = [ fdf^. 

Jk 

We shall not need to deal with the entire dual spac^lM' of Ai but only 
with the part consisting of functionals / arising from A. 

Second, the dual T' of T can be identihed with B as follows. Every 
B E B induces a continuous linear functional .B on T by 

B{W) = Tr(BfF), 

because the product of a bounded operator and a trace-class operator 
is again in the trace class (i.e., the trace class is an ideal in the ring 
of bounded operators). Furthermore, every bounded linear functional 
on T arises in this way from a unique B E B. The correspondence 
B i-A i? is an isometric isomorphism between B and T', and so one 
often identihes these two spaces. For details about this duality, see, for 
example, [151 Theorem 23]. 

Remark 9. Note that this duality relationship is not symmetric. That 
is, although each W eT induces a continuous linear functional on B, 
namely A i—)■ Tr(AhF), these will not be all of the linear functionals on 
B unless l-i is hnite-dimensional. 

Spekkens [16] emphasizes a certain symmetry between states and 
measurements, and, at the end of the paper, he seeks to give an “even- 
handed” proof of a no-go theorem, respecting this symmetry. The fact 
that B, the space in which measurements live, is the dual of T, the 
space in which states live, but not vice versa, suggests that the actual 
situation is not really symmetrical. 

One reflection of this asymmetry arises when we try to prove a no-go 
theorem for probability representations (Spekkens version) as dehned 
above. After building into that dehnition our clarihcations and cor¬ 
rections of Spekkens’s assumptions, the proof that we obtained, and 
which we record in Appendix [B] below, is not even-handed in the sense 
desired by Spekkens. We do not have any even-handed proof of an 
expectation no-go theorem. 

®Iii general, and even for nice measurable spaces like the real line R with the 
cr-algebra of Borel sets, M.' is an unpleasantly complicated space. In particular, in 
this special case of K, the linear functional assigning to each measure ^ G At the 
total measure of all the individual points, form / for 

any / G A. For more information about the dual of At, see, for example, and 
the references cited there. 
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The asymmetry in the duality relationship between B and T is spe- 
cihc to the case of inhnite-dimensional spaces. In the case of hnite- 
dimensional "H, say of dimension d, all bounded linear operators are in 
the trace class, so B and T are the same when considered just as vector 
spaces. Their norms, though not identical, are equivalent in the sense 
that each is bounded by a constant (depending on d) multiple of the 
other. They are identihed with the space of Hermitian dx d matrices. 

To get a really smooth symmetry, though, one would need not only 
that Ti is hnite-dimensional but also that A is hnite. That additional 
hniteness would make Ai and A dual to each other and would avoid 
the messiness that arises in }A' in the general case. Unfortunately, 
hniteness of A is quite a restrictive assumption. Consider, for example, 
a spin-1 particle in an eigenstate of the 2 ;-spin. The hidden variables 
in this situation would have to determine the spin components in all 
directions other than 2 ;, and there is a continuum of possibilities there. 
It seems that hniteness of A becomes plausible only if one can argue 
that, because of limited precision of measurements, the spaces of mea¬ 
surement outcomes can be discretized and thus treated as hnite. 

See Section El below for further discussion of symmetry (or its ab¬ 
sence) in the light of some specihc examples. 

We are now in a position to present the notion that Ferrie et al. 
[8] call a classical representation of quantum mechanics. We prefer 
to call it a probability representation, viewing it as an updating and 
clarihcation of the notion introduced in Dehnition [5l 

Definition 10. A probability representation (Ferrie-Morris-Emerson 
version) for quantum systems described by "H consists of 

• a measurable space A, 

• a convex-linear map T from the set 7+i of density matrices into 
the set Ad+i of probability measures, and 

• a convex-linear map S from the set of ehects into the set 
J^[o,i] of measurable functions from A to [0,1], 

subject to, for all p G 7+i and all E G I3[op], 

Tr(pU)= f S{E)dT{p). 

Ja 

The correspondence between this dehnition and the earlier Dehni- 
tion[5]is that the measure T(p) is what was previously written Pp{X) d\, 
and S{E) was previously ^e- The “trace equals integral” requirement 
in the last clause of the dehnition still says that the expectation of 
the ehect E in the state p is the same whether computed in quantum 
mechanics (the trace) or in the hidden-variable theory (the integral). 
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Theorem 1 of [H] asserts that such a probability representation is 
impossible (provided "H has dimension at least 2). The proof has a 
gap, which we hll in the next section, and we simultaneously make 
some other improvements to the theorem and its proof. 

3.4. Our Expectation No-Go Theorem. In this section, we prove 
the hrst main result of this paper, a no-go theorem that strengthens 
Theorem 1 of [8]. Our theorem and its proof are based on the result 
in [8] but differ from it in two major respects. First, we use a weaker 
hypothesis, requiring the existence of S{E) only for (certain) sharp 
effects E, not for all effects. We impose no convex-linearity assumption 
on S. Second, we hll a gap that apparently resulted from quoting a 
misstated fact in [1]. In addition to these changes, we also remove an 
unnecessary paragraph in the otherwise terse proof. 

The following dehnition, our hnal updating of the notion of “proba¬ 
bility representation,” expresses the hypotheses necessary for our the¬ 
orem. The conventions in Notation [3 remain in force. 

Definition 11. A probability representation (our version) for quantum 
systems described by "H consists of 

• a measurable space A, 

• a convex-linear map T from the set 7+i of density matrices into 
the set M .+1 of probability measures, and 

• a map S from the set of rank-1 projections in "H into the set 

of measurable functions from A to [0,1], 
subject to, for all p G 7+i and all rank-1 projections E, 

Tt{pE)= f S{E)dT{p). 

Ja 

This dehnition dihers from the previous version, Dehnition [T0|, in 
that the domain of S is no longer the set of all ehects but the 
much smaller set of sharp ehects of rank 1. The requirement that S 
be convex-linear is removed, because it would make no sense when the 
domain of S is not convex. 

Remark 12. The restriction to sharp ehects is signihcant because, as 
explained in Remark 01 measuring an ehect E is not in general the 
same as measuring the observable that is given by the same self-adjoint 
operator E. The two sorts of measurement are the same if and only 
if E is a sharp ehect, i.e., a projection operator from "H to a closed 
subspace. Thus, sharp ehects are the area common to the ehect-based 
hidden-variable notions considered in this section and the observable- 
based hidden-variable theories to be discussed in Section 0] below. 
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Definition [m reduces the domain of S not only to the set of sharp 
effects but to the even smaller set of projections for which the rank, 
the dimension of the range, is 1. This additional reduction is included 
simply as a mathematical optimization of the theorem. 

Since the domain of S is, in Definition [TTl no longer a convex set, 
there is no requirement that S be convex-linear. In principle, a quite 
arbitrary function could serve as S, though, as we shall see in the proof 
of the theorem below, the last clause of the definition, equating a trace 
to an integral, implies a remnant of linearity for S, namely that S is 
one-to-one and its inverse is the restriction to the range of S' of a linear 
transformation. 

We now turn to our expectation no-go theorem. Theorem [T] in the 
introduction, expressing it in the language of probability representa¬ 
tions. 

Theorem 13. If the Hilbert space H has dimension at least 2, then 
there is no probability representation (our version) for quantum systems 
described by H. 

Proof. Suppose, toward a contradiction, that we have a probability 
representation (our version), consisting of A, T, S', for some PL of di¬ 
mension at least 2. We begin by working with the convex-linear map 
T ; 7+1 —!■ Ad+i, and our first objective is to extend it to a linear map, 
still called T, from all of T into AT. For general information about 
such extensions of convex-linear maps, see Appendix but for the 
case at hand it is convenient to give the following very specihc argu¬ 
ment. Any trace-class self-adjoint operator A G T can be written as 
the difference of two positive trace-class operators A = A+ — A_, where 
A+ has the same positive eigenvalues and corresponding eigenspaces as 
A but is identically zero on all the eigenspaces corresponding to non¬ 
positive eigenvalues. —A_ similarly matches the negative eigenvalues 
and eigenspaces of A; we reverse its sign to get the positive operator 
A_. As long as neither A+ nor A_ is zero, we can multiply them by 
suitable scalars to produce operators with trace 1, i.e., elements of 7+i, 
and thus we can write A = bB — cC where B,C G 7+i and b, c are 
positive real numbers. If one or both of A+ and A_ is zero, then we 
still have such a formula for A but one or both of b and c will be zero. 
So we always have A = bB — cC where B,C E 7+i and b,c> 0. Note 
for future reference that in this situation Tr(A) = b — c and, for the 
particular construction of A±, B, and C given here, ||A|| =h + c. (The 
norm here is that in T, which we defined as the sum of the absolute 
values of the eigenvalues.) 
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We extend T to a map T : T —)• by setting, with notation as 

above, T{A) = bT{B) — cT{C). Even though A can have many repre¬ 
sentations as bB — cC with B,C E 7+i and &, c > 0, they all yield the 
same T{A). Indeed, if b'B' — c'C is another such representation, then 
from bB — cC = A = b'B' — c'C, we obtain bB + c'C = b'B' + cC. Fur- 
themore, since all of B, C, B', C have trace 1, we also have b+c' = b'+c, 
and therefore 


b + c' 


-B 


b + c‘ 


C = 


b' 


b' + c 


B' + 


-C 


Here, both sides are convex combinations, so convex-linearity of T 
yields 


b 

b + c 


+B) + 


b + ct 


-T{C) 


T(H') +-^T(C'). 


U + c 


Transposing some terms and clearing fractions (remembering that b + 
cf = b' + c), we get 


bT{B) - cT{C) = b'T{B') - c'T{C), 


which means that T{A) is well-dehned. An easy computation then 
shows that T is linear. 

We claim that T is a bounded linear transformation. To this end, 
consider some A with ||A|| < 1 in T. Then, as indicated above, we 
can represent A as bB — cC with B,C E 7+i, with 6, c > 0, and with 
b+c = ||A|| < 1. Now T{B) and T{C) are measures with norm 1 in A4.. 
So T{A) = bT{B) —cT{C) has norm at most b + c< 1. This completes 
the proof that T : T —)■ Ad is a bounded linear transformation. 

It follows that T induces a bounded linear transformation on the 
dual spaces, T' : Ai' -E T'. In detail, T' sends any bounded linear 
functional h E M' (which means h : Ad —)■ M) to the bounded linear 
functional T'(h) = h o T : T —)■ M; 

T'{h){A) = h{T{A)) for all h E M' and all AeT. 


Recall from the discussion in Subsection 13.31 how the dual space T' of 
T is identihed with B and part of the dual space Ad' of Ad is identihed 
with B. Via these identihcations, T' : Ad' — T' restricts to a bounded 
linear transformation, which we still call T', from B to B. Untangling 
the dehnitions, we hnd that, for each f E is the unique element 

of B that satishes 


(1) Ai{T'{f)A) = ! fdT{A) for all Her. 

J k 

Indeed, the left side of this equation is the value obtained by applying 
to A eT the functional identihed with T'(/) G B, while the right side 
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is the value obtained by applying to the measure T{A) the functional 
identihed with f E 

Note also that this equation, though true for all A E T, would still 
suffice to uniquely determine T'(/) if it were asserted only for A E 7+i; 
this is because, as we showed above, the linear span of 7+i is the whole 
space T. 

We now invoke the last clause in Dehnition [TT] to hnd that, for all 
rank-1 projections E and all p E 7+i, 

Tr(h;p)= f S{E)dT{p)=TT{r{S{E))p)). 

Ja 

But this is, as we saw in the preceding paragraph, enough to show that 
T'{S{E)) = E. 

Recall that we imposed no linearity conditions on S. Nevertheless, 
because T' is linear, this last equation gives what can be viewed as a 
weak linearity requirement for S. On its range, S is inverted by a linear 
transformation T'. 

So far, we have followed the argument in [S] fairly closely, just adding 
some details, for example the reason why T is bounded, and noting 
that a drastically reduced domain of S suffices. At this point, though, 
Ferrie et ah claim, quoting Bugajski [1], that the linearity of T' implies 
that it preserves a property called coexistence. Unfortunately, this 
preservation claim needs not only that T' is linear but also that it 
preserves positivity and sends the constant function 1 to the identity 
operator. T' actually has these properties, but this needs to be checked; 
we give the proof below. Also, although we could work with the general 
notion of coexistence, it turns out to be more convenient to use an 
equivalent formulation, from [5], for the special case of two effects. 
(For readers interested in the general notion, we suggest |1] and |9].) 

In preparation for the next step in the proof, we need some compu¬ 
tations. The hrst of these is to compute T'{1), where 1 E E means 
the constant function with value 1. Referring to the formula ([T]) char¬ 
acterizing T' and remembering that it suffices to have this formula for 
A E 7+1, we see that T'(l) is the unique bounded linear operator that 
satishes, for all p E 7+i, 

Ti'(r(i+)= f <ir(p) = r(p)(A) = i = Ti(p) = Tt(/p), 

Pa 

where the third equality comes from the fact that T maps 7+i into the 
space Ad+i of probability measures. Thus, T'(l) = J. 

The other computation that we need is conveniently summarized in 
the following lemma. Recall that a bounded linear operator A is said 
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to be positive if ((ip\A\xlj) > 0 for all \'4)) G H and that A < B means 
that B — Ais positive. 

Lemma 14. If f & B is nonnegative (meaning /(A) > 0 for all X ^ A), 
then T'{f) is a positive operator. Therefore, if f < g pointwise in IF 
then T'(/) < T'(^) m B. 

Proof. The second assertion follows immediately from the hrst applied 
to g — f, because T' is linear. To prove the hrst assertion, suppose 
f E IF is nonnegative, and let {if) be any vector in PL. The conclusion 
we want to deduce, ('0|T'(/)|'0) > 0, is obvious if |'0) = 0, so we 
may assume that {if) is a non-zero vector. Normalizing it, we may 
assume further that its length is 1. Then \'if){'if\ G 7+i and therefore 
G M.+ 1 . Using equation ([T]), we compute 

{f;\T\fm=Tr{T\fm{f;\)= [ fdTm{i:\)>0, 

Jk 

where we have used that both the measure T(|'^)('^|) and the integrand 
/ are nonnegative 0 □ 

The following lemma says, in view of a criterion of Heinosaari 0 
equation (2)], that any two elements of .T[o,i] coexist. 

Lemma 15. If f,g G then there exists h G such that all 

four of h, f — h, g — h, and 1 — f — g + h are nonnegative. 

Proof. Dehne h{X) = min{/(A),^^(A)} for all A G A. Then the hrst 
three of the assertions in the lemma are obvious, and the fourth be¬ 
comes obvious if we observe that f + g — h = max{/, g} < 1. □ 

Corollary 16. For any two rank-1 projections A,B of PL, there exists 
an operator H E B such that all four of H, A — H, B — H, and 
I — A — B + H are positive operators. 

Proof. Apply Lemma [15] with / = S{A) and g = S{B), let h be the 
function given by the lemma, and let H = T'{h). The nonnegativity of 
h, f — h, g — h, and 1 — f — g + h implies, by Lemma [HI the positivity 
of r{h) = H, r{S{A) -h) = A-H, T'iSiB) -h) = B-H, and 
r'(l — S{A) — S{B) + h) = I — A — B + H , where we have also used the 
linearity of T', the fact that T'(l) = J, and the formula T'{S{A)) = A 
for all A in the domain of S'. □ 

Let us apply this corollary to two specihc rank-1 projections. Fix 
two orthonormal vectors |0) and |1). (This is where we use that PL has 

'^The proof would break down here if we were working with possibly negative 
quasiprobabilities. 
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dimension at least 2.) Let |+) = (|0) + |l))/\/2. We use the projections 
A = |0)(0| and B = |+)(+| to the subspaces spanned by |0) and |+). 
Let H be as in Corollary fTHl for these projections A and B. 

From the positivity of H and oi A — H, we get that 0 < (l|Lr|l) and 
that 

0 < ( 1|(/1 - ff )| l ) = { 1 |/ 1 | 1 ) - ( 1 | 7 /| 1 ) = 

where we have used that |1), being orthogonal to |0), is annihilated 
by A. Combining the two inequalities, we infer that (l|iL|l) = 0 and 
therefore, since H is positive, H\l) = 0. Similarly, using the orthogonal 
vectors |+) and |—) = |0) — |l))/\/2 in place of |0) and |1), we obtain 
H\—) = 0. So, being linear, H is identically zero on the subspace of "H 
spanned by |1) and |—); note that |0) is in this subspace, so we have 
H\Q) = 0. 

Now we use the part of Corollary [16] that has not yet been used, 
namely the positivity oil — A — B + H. Since -ff|0) = 0, we can 
compute 

-1 

7 !' 

This contradiction completes the proof of the theorem. □ 

4. Value No-Go Theorems 

We turn now to a different species of no-go theorems, ones saying 
that hidden-variable theories cannot even produce the correct outcomes 
for individual measurements, let alone the correct probabilities or ex¬ 
pectation values. Such theorems considerably predated the expectation 
no-go theorems considered in the preceding section. Value no-go the¬ 
orems were hrst established by Bell DEI and then by Kochen and 
Specker we shall also refer to the user-friendly exposition given by 
Mermin [T2] . 

Note that there is no implication in either direction between value 
no-go theorems and expectation no-go theorems. The former say that 
a hidden-variable theory cannot predict the correct values for measured 
quantities, but it might still predict the correct expectations; the latter 
say that a hidden-variable theory cannot predict the correct expecta¬ 
tions, but it might still predict the correct values. 

Of course, in order to formulate value no-go theorems, one must 
specify what “correct outcomes for individual measurements” means. 
For this purpose, we need the notion of the joint spectrum of commut¬ 
ing operators on Hilbert space, and we devote the next subsection to 
summarizing the basic facts about joint spectra. 


0 < {0\{I-A-B+H)\0) = (0|0)-(0|H|0)-(0|H|0) = 1-1- 


x/2 
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4.1. Joint Spectra. A general reference for the notion of joint spec¬ 
trum is [51 Section 6.5]. 

Let Ai,...,A„ be a finite list of pairwise commuting, self-adjoint 
operators on a Hilbert space "H. The notion of the joint spectrum of 
such a list is a natural generalization of the notion of the spectrum of 
a single self-adjoint operator. 

The simplest case occurs when the operators are simultaneously di- 
agonalizable, i.e., when "H admits an orthonormal basis consisting of 
common eigenvectors of all the Aj’s. In this case, the joint spectrum 
consists of the n-tuples of scalars p = (z/i,..., G M” that occur as 
the eigenvalues for such common eigenvectors. That is, v belongs to 
the joint spectrum if and only if there is a non-zero vector |'0) G "H 
such that Aj|'0) = r'i|'0) for i = 1,..., n. 

If l-L is hnite-dimensional, then this simple case is the only one that 
can arise, but for inhnite-dimensional l-L we must take into account 
the possibility of a continuous spectrum (instead of, or in addition 
to, the discrete spectrum given by eigenvectors). A point v G M” 
belongs to the joint spectrum cr(Ai,..., An) of Ai ,..., A„ if and only 
if it is approximately a tuple of eigenvalues in the following sense: 
For every positive e, there is a unit vector 1-0) G H (an approximate 
simultaneous eigenvector) such that, for each i = l,...,n, we have 
WAilij) - Ui\'ijj)\\ < e. 

The joint spectrum of a tuple of self-adjoint operators is a closed 
subset of M". If the operators are bounded, then so is their joint spec¬ 
trum. 

Just as for a single operator, there is a spectral decomposition leading 
to a functional calculus for tuples of commuting self-adjoint operators. 
In more detail, there is a unique spectral measure E, a countably addi¬ 
tive map from Borel subsets of to projection operators on H, such 
that, for each i, 


A, 


Xi dE{xi,. . .,Xn). 


The joint spectrum cr(Ai,..., An) can be characterized as the sup¬ 
port of this spectral measure, i.e., the set of points G R” such that 
E{B) 7^ 0 for all neighborhoods B of u. 

The preceding information about joint spectra is explicit in [3l Sec¬ 
tion 6.5]. (For the boundedness of the joint spectrum of commuting 
bounded operators, look at the proof of Theorem 1 in that section.) 
What follows is implicit in the statement, on page 155 of [3], that 
most of Section 1, Subsection 4, which concerns functions of a single 
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operator, can be repeated in the present context of several commut¬ 
ing operators. We fill in some arguments that are not given in that 
subsection of [3]. 

Given a Borel function / : M"" M, one dehnes 



f{Ai,...,An) = / f{Xi,...,Xn)dE{xi,...,Xn). 


We shall use this notion only for continuous /, and in this case we have 
the following useful information. 

Proposition 17. Let Ai,..., An be commuting, self-adjoint operators, 
with joint spectrum a{Ai ,..., An ). Then, for any continuous / : M” —)■ 
M, we have f{Ai ,..., An) = 0 z/ and only if f vanishes identically on 
a{Ai ,..., An). Furthermore, a point z/ G M"' belongs to a{Ai ,..., An) 
if and only if every continuous function f : M”' —)■ R that satisfies 
f{Ai ,..., An) = 0 also satisfies fiv) = 0. 

Proof. Although we have two “if and only if” statements to prove, 
their “only if” halves say the same thing, so we need only to prove 
three implications: 

(1) If a continuous function / : R” —)■ R vanishes identically on 
(t(Ai, ...,An), then f{Ai ,..., A„) = 0 . 

(2) If f{Ai,...,An) = 0 for a continuous / and if z/ G 

a{Ai, ...,An), then /(z/) = 0. 

(3) If z/ ^ cr(Ai,..., An), then there is a continuous / : R” —)■ R 
with /(Ai,..., A„) = 0 but /(z/) 7 ^ 0 

Item (1) here is clear from the dehnition of f{Ai ,..., An). It is the 
integral of / with respect to E, and / vanishes on the support of E. 

For item (2), we use the generalization to several commuting opera¬ 
tors of a fact from the cited subsection of [3j, namely that 


||/(Ai,..., A„)|| = E-sup{\f{u)\ : v G cr(Ai,..., A„)}. 


Here the notation E- sup means the essential supremum with respect to 
the spectral measure E, which is the inhmum of all the numbers a such 
that F'({z/ : |/(z^)| > a}) = 0. In the situation of item (2), we therefore 
have that this essential supremum is zero. Suppose now, toward a 
contradiction, that z/ G cr(Ai,..., A„) is a point for which /(z/) 7 ^ 0. 
Since / is continuous and /(z/) 7 ^ 0, there is an open neighborhood 
of z^ such that, for all x E N, \f{x)\ > ||/(z^)| > 0. Since the 
essential supremum of |/| is zero, there is an a < ||/(z^)| for which 
E{{x : \f{x)\ > a}) = 0. But the set {x : |/(x)| > a} includes N, so 
E{N) = 0. This is a contradiction, because every neighborhood N of 
a point u in the joint spectrum must have E{N) 7 ^ 0. 
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Finally, to prove item (3), suppose v ^ a{Ai,...,An) and notice 
that, thanks to item (1), we need only hnd a continuous / that van¬ 
ishes identically on a{Ai,...,An) but does not vanish at i/. Since 
a{Ai,... ,An) is closed, the function sending each point in M"" to its 
distance from a{Ai ,..., An) does the job. □ 

The last assertion in Proposition [T7] can be summarized as: The 
joint spectrum of Ai,An consists of all those points (i/i,..., z/„) 
that satisfy all the same equations as the operators themselves. Here 
“equations” should be understood as equations between continuous 
functions. 

Just as the points in the spectrum of a single Hermitian operator A 
are, according to quantum theory, the possible results of a measurement 
of H, so the points in the joint spectrum of Hi,..., are the possible 
outcomes of a simultaneous measurement of all of Hi,..., H„. Note 
that both mathematics and physics require the operators Hi,... ,H„ 
here to commute — mathematics in order that the joint spectrum be 
dehned, and physics in order that these observables be simultaneously 
measurable. 

We record, for future reference, some very special cases of the dehni- 
tion of joint spectrum. These all fall under the simple case mentioned 
at the beginning of this subsection: the operators will be simultane¬ 
ously diagonalizable, so the joint spectrum consists of the eigenvalues 
for the common eigenvectors of the operators Hi,..., H„. If the Hj are 
projections, then each point in their joint spectrum is a tuple of zeros 
and ones. If Hi,... ,H„ are the rank-1 projections to an orthogonal 
set of directions, then their joint spectrum contains all the n-tuples 
consisting of a single one and n — 1 zeros. The only other point that 
could be in the joint spectrum is the n-tuple of all zeros; it is present 
if and only if the directions to which that H^’s project do not span the 
whole space "H. 

4.2. Value Maps. Now we are ready to dehne precisely what is ex¬ 
pected of a hidden-variable theory in order for it to predict the correct 
values for observables. The following dehnition, which is based on the 
discussion in [121 Section II], is intended to provide that specihcation. 

Definition 18. Let "H be a Hilbert space, and let O he a. set of ob¬ 
servables, i.e., self-adjoint operators on 'H. A value map for O m 14. 
is a function v assigning to each observable A G O a number v(A) in 
the spectrum of H, in such a way that, whenever Hi,..., H„ are pair¬ 
wise commuting elements of O, then (t(Hi), ... ,t(H„)) is in the joint 
spectrum of (Hi,..., H„). 
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The intention behind this dehnition is that, in a hidden-variable the¬ 
ory, a quantum state represents an ensemble of individual systems, each 
of which has dehnite values for observables. That is, each individual 
system has a value map associated to it, describing what values would 
be obtained if we were to measure observable properties of the system. 
A believer in such a hidden-variable theory would expect a value map 
for the largest possible (T, the set of all self-adjoint operators on "H, 
unless there were superselection rules rendering some such operators 
unobservable. 

The part of Dehnition [TS] about pairwise commuting operators says 
exactly that, if one measures the observables Ai,...,A„ simultane¬ 
ously, which is possible because they commute, then the values one 
obtains should be among the possibilities permitted by quantum me¬ 
chanics, namely the n-tuples in the joint spectrum of the operators. 

On the other hand, for observables that do not commute, quantum 
mechanics does not allow them to be simultaneously exactly measured, 
does not describe possible simultaneous values, and thus does not im¬ 
pose restrictions on value maps. 

4.3. No-Go Theorem. A hidden-variable theory should do more than 
just provide some value maps describing the properties of the sub¬ 
ensembles inside the quantum states. It should provide, for each quan¬ 
tum state p, a probability distribution pp over the set of value maps 
that accounts for the measured values of observables in O. The precise 
meaning of “accounts for” is as follows. For each observable A G (T, 
there is a probability distribution Pp induced on the spectrum of A by 

: v(A) € A'}) 

for all subsets X of the spectrum of A. This induced probability dis¬ 
tribution should agree with the probability distribution predicted by 
quantum theory for the observable A in the state p. 

One would thus expect that a no-go theorem in this context would 
say that there is no way to assign, to each state, an appropriate prob¬ 
ability distribution over value maps. Surprisingly, the no-go theorems 
of Bell [H |2] and Kochen and Specker [11] are far stronger. They say 
that, for "H of dimension at least 3, there are no value maps at all for 
l-L and the set (Tail of all self-adjoint operators on "H. Better yet, there 
are no value maps for certain specihc hnit^^ subsets O of Oaii- 


®In the case of finite-dimensional where each observable has only a finite 
spectrum, we can use the compactness theorem of propositional logic to infer, from 
the no-go theorem for (bail, that there is also a no-go theorem for some finite O C 
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We strengthen this result by tightly restricting the sort of observables 
that are needed in O. This is Theorem [2] from the introduction. 

Theorem 19. Suppose that the dimension of the Hilbert space is at 
least 3. 

(1) There is a finite set O of projections for which no value map 
exists. 

(2) If the dimension is finite then there is a finite set O of rank 1 
projections for which no value map exists. 

The desired finite sets of projections are constructed explicitly in the 
proof. 

Remark 20. The assumption in part (2) of Theorem [TU] that the dimen¬ 
sion of TL is finite cannot simply be omitted. If dim('H) is infinite, then 
the set O of all finite-rank projections admits a value map, namely 
the constant zero function. This works because the definition of “value 
map” imposes constraints on only finitely many observables at a time. 

Proof. We start with proving Theorem [I9l2, i.e. part (2) of Theo¬ 
rem [ini Arguably the result is implicit in [21 Section 5] but it is not 
explicitly stated there and no specihc O of the desired sort is given. In 
m and na, the result is explicitly proved for 3-dimensional H, but 
the extension to larger TL, which is easy if one just wants to extend a 
general no-go theorem, is not quite so obvious under the restriction to 
hnitely many rank-1 projections. Because of this situation, we outline 
both versions of the proof, referring to these older papers for much 
of the work but hlling in the additional arguments needed to get our 
result. 

Proof of Theorem\7R2 following Bell. Bell [21 Section 5] works from 
three basic properties of (what we call) a value map v, namely 

(1) For every rank-1 projection |'0)('0| (where lip) is a unit vector), 

is 0 or 1. 

(2) If u(|<p)((/?|) = 1 and li/)) is orthogonal to \lp), then = 

0 . 

(3) If v{\'ifi){'ifi\) = u(|'^ 2 )('^ 2 |) = 0 for two orthogonal unit vectors 
Ifji) and \'ip 2 ), then also u(|'0)('0|) = 0 for all unit vectors |'0) of 
the form cilV'i) + /d|V’ 2 )- 

All three of these follow from the definition of value map provided O 
contains all of the rank-1 projections of TL. Property (1) is immediate 
from the fact that the spectrum of a non-trivial projection is included 


Oaii. The compactness argument does not, however, produce a specific example of 
such an O. 
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in {0,1}. Similarly, Property (2) follows from the facts that, if |(p) 
and \'ip) are orthogonal, then the projections \ip){ip\ and |'0)('0| com¬ 
mute and their joint spectrum is {(0, 0), (0,1), (1, 0)}. (If "H were only 
2-dimensional, this joint spectrum would be only {(0,1), (1, 0)}, but 
Property (2) would still follow for the same reason: (1,1) is not in the 
joint spectrum.) 

To prove Property (3), complete {iV'i)) |'02)} to an orthonormal basis 
for Pf, say |'^ 2 ), • • •, |t/^n)}- The associated rank-1 projections 

|'0i)('^i| commute, and their joint spectrum consists of the vectors in 
which one component is 1 and all the rest are 0. So we must have 
'^(IV’*)) = 1 some i > 2. But then the desired equation in (3) 
follows from (2) because a|'0i) + P\'4’2) is orthogonal to (This 

argument appears to require dim(?{) > 3 in order to have a lipi) to 
work with here, but this appearance is wrong. If dim('H) = 2 then 
Property (3) holds vacuously because |'02)} is an orthonormal 

base for "H, so v must send one of the associated projections to 1. The 
real use of dim(?f) > 3 comes later.) 

Bell deduces from these three properties and dim('H) > 3 that v 
is continuous. More explicitly, he shows that, if u(|</9)(9?|) = 0 and 
u(|'0)('0|) = 1, for unit vectors \ip) and then |||<yc) — |'0)|| > i. 
His argument involves applying the three properties to some auxiliary 
vectors in addition to \(p) and l"^). Bell completes the proof of the 
no-go theorem by observing that, since v must take both values 0 and 
1, this continuity result is a contradiction. So there cannot be a value 
map dehned on all of the rank-1 projections. 

For our purposes, namely producing a hnite set O of rank-1 projec¬ 
tions with no value map, we must work a bit more. Using the fact that 
dim('H) is hnite and at least 2, start with an orthonormal base Oi for 
?{ and enlarge it to a hnite superset O 2 with the property that every 
two vectors I9?), |' 0 ) G O2 can be joined by a chain in O2, 

\^) = Ixo),Ixi),---,Im) = IV') 

in which the distance between any two consecutive terms is at most 
So, for each two consecutive terms. Bell’s argument gives us 
'v{\Xi){Xi\) = '^(IXj+i)(Xi+i|)- Of course, the argument involves the 
auxiliary vectors mentioned above, in addition to these two consecu¬ 
tive |x)’s, but there are only hnitely many of these auxiliary vectors. 
Adjoin all of those vectors, for all i, to O2 to get the hnal O. If v were 
a value map for O, then, by Bell’s argument, we would have v constant 
on the rank-1 projections associated to the vectors in O2 and therefore 
in particular the vectors in the ortho normal base Oi. That is absurd, 
because a value map, when applied to the projections associated to an 
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orthonormal base always produces a single 1 and the rest O’s. So O is 
as required by the theorem. □ 

Proof of Theorem\JR2 following Kochen-Speaker and Mermin. When 
the dimension of PL is exactly 3, the constructions given by Kochen 
and Specker m and Mermin [121 Section IV] provide the desired 
O. More precisely, the proof of Theorem 1 in m uses a Boolean 
algebra generated by a hnite set of one-dimensional subspaces of "H, 
and it shows that the projections to those subspaces constitute an 
O of the required sort. Mermin works instead with squares Sf of 
certain spin-components of a spin-1 particle, but these are projections 
to 2-dimensional subspaces of PL, and the complementary rank-1 
projections / — Sf serve as the desired O. 

When the dimension of PL is greater than 3, but still hnite, we shall 
see in Theorem [21] below how to bootstrap the result from lower to 
higher dimensions. Notice that, if one merely wants a no-go theorem 
saying that some O has no value map, then this bootstrapping is easy, 
as noted in [Diiiiiia. Work is needed only to get all the operators in 
O to be rank 1 projections. □ 

Proof TheoremlT^l. The case where dim(?f) is hnite was covered by 
Theorem [TUI 2. so it remains to treat the case of inhnite-dimensional PL. 

Let K. and C be Hilbert spaces, with dim(/C) = 3 and dim(£) = 
dim('H). Note that then their tensor product K, ® C has the same 
dimension as PL, so it can be identihed with PL. 

Let O be as in Theorem [19] for the 3-dimensional /C. Let 

O' = {P®Ic-.P eO], 

where Ic is the identity operator on C. Then O' is a set of inhnite-rank 
projections oi KL ® C = PL, having the same algebraic structure as O. 
It follows that there is no value map for O'. □ 

This completes the proof of Theorem [TU] □ 

We note that the measurements involved in Theorem [IU]2, namely 
the rank-1 projections, are the same as those involved in our expec¬ 
tation no-go Theorem [13] We hope that, by reducing both species of 
no-go theorems to an extremely simple sort of measurement, and fur¬ 
thermore a sort where measurement as observable and measurement as 
effect coincide, we have clarihed the similarities as well as the differ¬ 
ences between the two species. 
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5. Bootstrapping the dimension 

Our objective in this section is to show that, in many cases, a no-go 
theorem for a Hilbert space l-L automatically yields no-go theorems for 
larger Hilbert spaces, ones that contain l-L as closed subspaces. The 
section has independent value and can be read independently except 
that it needs the dehnition of value map and two dehnitions (Spekkens’s 
and ours) of probability representation. 

Intuitively, such dimension bootstrapping results are to be expected. 
If hidden-variable theories could explain the behavior of quantum sys¬ 
tems described by the larger Hilbert space, say "H', then they could also 
provide an explanation for systems described by the subspace H. The 
latter systems are, after all, just a special case of the former, consisting 
of the pure states that happen to lie in H or mixtures of such states. 

The no-go theorems under discussion here, both ours (Theorems [13] 
and [19]) and those from the previous literature ( [161E] [3 [8] [T] [2] [TT] [12] ), 
give much more information than just the impossibility of matching the 
predictions of quantum-mechanics with a hidden-variable theory. They 
establish that hidden-variable theories must fail in very specihc ways. 
It is not so obvious that these specihc sorts of failures, once established 
for a Hilbert space "H, necessarily also apply to its superspaces . 

We shall prove two theorems saying that no-go results for a Hilbert 
space %' follow directly from no-go results for a subspace H. The two 
theorems differ in the sort of no-go results that they apply to; one is for 
expectation no-go results as in Theorem [TS] the other is for value no-go 
results as in Theorem [T^l We shall also comment on the situation for 
the results in [111 El Ej- 

We begin with the theorem dealing with value no-go results. This 
is the most important part of this section, because it was used in the 
proof of Theorem [1912 above. There, we invoked constructions from 
the literature proving the result for H of dimension 3 but we claimed 
the result for all hnite dimensions from 3 up. That claim is supported 
by the following theorem. 

Theorem 21. Suppose H ^ H' are finite-dimensional Hilbert spaces. 
Suppose further that O is a finite set of rank-1 projections of H for 
which no value map exists. Then there is a finite set O' of rank-1 
projections ofH' for which no value map exists. 

Proof. Clearly, if two Hilbert spaces are isomorphic and if one of them 
has a hnite set O of rank-1 projections with no value map, then the 
other also has such a set. It suffices to conjugate the projections in O 
by any isomorphism between the two spaces. Thus, the existence of 
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such a set O depends only on the dimension of the Hilbert space, not 
on the specihc space. 

Proceeding by induction on the dimension of "H', we see that it suf- 
hces to prove the theorem in the case where dim('H') = dim('H) -|- 1. 
Given such "H and H', let be any unit vector in H', and observe 
that its orthogonal complement, is a subspace of "H' of the same 

dimension as "H and thus isomorphic to "H. By the induction hypothe¬ 
sis, this subspace has a finite set O of rank-1 projections for which 
no value map exists. Each element of O can be regarded as a rank-1 
projection of H'] indeed, if the projection was given by \^){^\ in 
then we can just interpret the same formula |v?)(‘^| in "H', using the 
same unit vector \ip) G 

Let Oi consist of all the projections from O, interpreted as projec¬ 
tions of T-L', together with one additional rank-1 projection, namely 
What can a value map v for Oi look like? It must send \'ip)((ip\ 
to one of its eigenvalues, 0 or 1. 

Suppose first that u(|'0)('0l) = 0- Then, using the fact that 
commutes with all the other elements of (Pi, we easily compute that 
what V does to those other elements amounts to a value map for O. 
But O was chosen so that it has no value map, and so we cannot have 
v{\'ip){'ip\) = 0. Therefore v{\'ip)((ip\) = 1- (If follows that v maps the 
projections associated to all the other elements of O' to zero, but we 
shall not need this fact.) 

We have thus shown that any value map for the hnite set Oi must 
send to 1. Repeat the argument for another unit vector l"^') 

that is orthogonal to l"^). There is a finite set O 2 of rank-1 projections 
such that any value map for O 2 must send to 1. No value map 

can send both |'^)('0| and fo 1) because their joint spectrum 

consists of only (1, 0) and (0,1). Therefore, there can be no value map 
for the union Oi U (P 2 , which thus serves as the O' required by the 
theorem. □ 


The finiteness of dim('H') is essential in this theorem. If the theorem 
were true for infinite-dimensional "H', then the same would be the case 
for Theorem [T^ contrary to Remark [20l The next theorem, in contrast, 
does not require dimensions to be hnite. 

Theorem 22. Let H' be a Hilbert space and H a closed subspace ofH'. 
From any probability representation (our version) for quantum systems 
described by Ft', one can directly construct such a representation for 
systems described by Ft. 
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Strictly speaking, this theorem is vacuous, since Theorem [I3] says 
that there is no probability representation (our version) for quantum 
systems described by any Hilbert space of dimension > 2. The inten¬ 
tion, however, is that the construction here is considerably easier than 
that in Theorem [131 In particular, if we knew Theorem [13] only for 2- 
dimensional this would suffice to get the full Theorem [TS] This fact 
supports our assessment, in Section [3l that the careful development 
and rigorous proofs in |H] are a greater contribution than the extension 
to inhnite-dimensional Hilbert spaces. (Additional support will come 
later in this section.) 


Proof. We construct a probability representation (our version) A, T, 
and S for quantum systems described by "H (with notation as in Def¬ 
inition [TT]) from any such representation A', T', and S' for the larger 
Hilbert space PL'. To begin, we set A = A'. 

To dehne T and S, we use the inclusion map i : PL H', sending 
each element of PL to itself considered as an element of PL', and we 
use the adjoint p : PL' ^ TL, which is the orthogonal projection of PL' 
onto PL. Given any density operator p G 7+1(7^), we can expand it to a 
density operator p = iopop g T+iiP.'). Note that this expansion is very 
natural: If p corresponds to a pure state G "H, i.e., if p = |'0)('0|) 
then p corresponds to the same |' 0 ) G PL'. If, on the other hand, p is a 
mixture of states p*, then p is the mixture, with the same coefficients, 
of the pi. Dehne T : T+i{PL) —)■ Ad+i(A) by T(p) = T'{p). 

The dehnition of S is similar. Notice that, if is a rank-1 projection 
in PL, then E = ioEopisa. rank-1 projection in PL'. So we can dehne 
S{E) = S'{E). Again, the passage from i? to A' is very natural. If E 
projects to the one-dimensional subspace spanned by |'0) G PL, then E 
projects to the same subspace, now considered as a subspace of PL'. 

This completes the dehnition of A, T , and S. Most of the require¬ 
ments in Dehnition [m are trivial to verify. For the last requirement, 
the agreement between the expectation computed as a trace in quan¬ 
tum mechanics and the expectation computed as an integral in the 
probability representation, it is useful to notice hrst p o i is the identity 
operator on PL. We can then compute, for any p G T+iiPL) and any 
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rank-1 projection on "H, 




S{E)dT{p) = / S'{E)dT'{p) 


L 


= Tr(pE) 

= Tr(i opopoioEop) 
= Tr(i o po E op) 

= Tr(p o E opoi) 

= Tr(poE), 


as required. 


□ 


To finish this section, we briefly discuss the possibility of transferring 
no-go theorems as in [ISIEIIT] from a Hilbert space "H to a larger space 
H'. To be specihc, we consider probability representations (Spekkens 
version) as in Dehnition |5l subject to the assumptions of determinate¬ 
ness {^E depends only on the effect E, not on the POVM containing 
it) and convex-linearity of both of the maps p Pp and E i—)■ ^e- 

Proposition 23. Let LL be a closed subspace of the Hilbert space H'. 
If H' admist a probability representation (Spekkens version) satisfying 
determinateness and convex-linearity, then so does H. 

Proof. At hrst, it might seem that we can proceed exactly as in the 
proof of Theorem [221 transforming the density operators p and effects 
E of the subspace PL to density operators p = i o p o p and effects 
E = io E op on the superspace PL) and then using this transformation 
to convert a probability representation (Spekkens version) for PL) say 
A', p) to one for PL. In detail, we would use the same measure space, 
A = A', and we would set Pp = p) and ^e = ip- 

This approach works well as far as p and Pp are concerned, but there 
is a problem with E and ^e- Dehnition [5] requires that, if {Ek : /c G iP} 
is a POVM, i.e., if the effects Ek have sum J, then ^ 

for all A G A. Given that f' satishes this requirement on PL) we want 
that f satishes it on PL. So we would like to argue that, if {Ek : /c G iP} 
is a POVM in "H, then {Ek : /c G iP} is a POVM in PL) which would 
give us that ^ Unfortunately, {Ek : /c G iP} 

will not be a POVM for PL' (unless PL = PL'). Indeed, using the fact 
that {Ek : fc G iP} is a POVM, we can compute 
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Here iop is the transformation iop that projects orthogonally 

to the subspace it is not the identity unless 1-L = 1-L'. 

To correct the problem, we modify the dehnition of E as follows. 
Fix an arbitrary unit vector |a) G Ti. Then define E to be the unique 
linear operator on H' such that 




E\'iIj) ifV'GTf, 

{a\E\a)\'il)) if 


In other words, E agrees with E on l-i and with a scalar multiple of 
the identity on the orthogonal complement of Ti, the multiplier of the 
identity being {a\E\a). Another way to write E uses the operator 
/ — i o p, which projects IE onto the orthogonal complement of 71] we 
have 

E = i o E o p {a\E\a){I — i o p). 

This new version of E overcomes the problem with the old one, because, 
if = E then, because |a) is a unit vector, Ylik^^\^k\(y) = 1 and 


= '^ioEkop + ^{a\Ek\a){I-iop) = iop+l{I-iop) = L 
k k k 


Furthermore, this extension process from E to E sends the identity 
and zero operators on "H to the identity and zero operators on Ti', and 
the process respects weighted averages. Using the new extension pro¬ 
cess, we define and we claim that the result is a probability 

representation (Spekkens version) for Ti. The only non-trivial thing 
to check is the hnal requirement that the quantum-theoretic expecta¬ 
tion values Tr(pF') agree with the hidden-variable theory’s expectation 
values / d\ Pp{X)^eW- We compute 



dX fip{X)^E{X) 


[ dAp'(A)e^(A) 

Ja 

Ti{pE) 

Ti{ipp ■ {iEp + (alula)(/ — ip))) 
Tri^ippiEp) -|- (a|U|a)Tr(zpp(/ — ip)). 


The first term here was computed earlier and found to be Tr(pU), 
which is the desired result, so it remains to check that the second term 
vanishes. Up to a factor (a|U|a), it is 

Tr(ipp — ippip) = Tr(ipp — ipp) = 0, 

where we have used that pi is the identity operator of Ti. This com¬ 
pletes the proof of the proposition. □ 
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Thus, for example, to prove the no-go theorems of Spekkens ng 
and of Ferrie and Emerson P, [7] (with appropriate clarihcations as 
discussed above in Section [3]), it would suffice to prove them for two- 
dimensional Hilbert spaces (in quantum computing terminology, one- 
qubit spaces); the theorems would automatically carry over to all larger 
Hilbert spaces. Because of the need for clarihcations in these theorems, 
we give, in Appendix [Bl a proof of a Spekkens-style no-go theorem for 
Hilbert spaces of dimension two. 

Remark 24. The proof of Proposition [2S] involved choosing an arbitrary 
unit vector |a) in Ri. This arbitrariness can be avoided when RL is hnite- 
dimensional by averaging over all |q;)’s. That is, if dim('H) = d, then 
we can replace the dehnition of E in the proof with 



and, since Tr(/) = d, the rest of the proof would work as before. 


6. Bell’s Example and Symmetry 


Theorem [13] applies to all Hilbert spaces of dimension at least 2. We 
cannot expect any sort of no-go result in lower dimensions, because 
quantum theory in Hilbert spaces of dimensions 0 and 1 is trivial and 
therefore classical. The second part of Theorem [19] applies only to 
Hilbert spaces whose dimension is hnite and at least 3. We have already 
indicated in Remark [20] why the theorem fails in inhnite dimensions and 
in the hrst part of Theorem [19] why a modihed version holds in inhnite 
dimensions. What about dimension 2? 

Bell has given, in B 0. hidden-variable theories for a two- 
dimensional Hilbert space. More precisely, he has assigned to each 
pure state |^) in such a Hilbert space RL a probability distribution on 
value maps, such that the resulting probability distributions for any 
observable agree with the predictions of quantum theory. In this sec¬ 
tion, we summarize the improved version of Bell’s example described 
by Mermin [12], we simplify part of his argument, and we explain why 
the example doesn’t contradict Theorem [TS] 

We work with the Hilbert space Ri of 2-component vectors over C, 
so that operators on Ri are given by 2 x 2 matrices. Let a be the 
3-component “vector” whose entries are the Pauli matrices 
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If n is any 3-component unit vector in then the dot product n ■ a 
is a Hermitian operator with eigenvalues ±1. Every pure state of "H 
is an eigenstate, for eigenvalue -|-1, of n ■ a for a unique n. We use 
the notation \n) for this eigenstate. (If "H represents the states of a 
spin-| particle, then the operator \n-a represents the spin component 
in the direction n, and so \n) represents the state in which the spin is 
definitely aligned in the direction n. It is a special property of spin ^ 
that all pure states are of this form; for higher spins, a superposition 
of states with definite spin directions need not have a definite spin 
direction.) 

Any observable, i.e., any Hermitian operator on T-L, can be expressed 
as A = qqI + {a ■ a) for some scalar oq G M and vector a G M^. Its 
eigenvalues are Oq ± ||a||. 

The hidden-variable theory, as presented in [121 Section 3], assigns 
to each state \n) a family of sub-ensembles labeled by unit vectors 
m G the probability distribution of fh being uniform on the unit 
sphere in In the sub-ensemble of \n) given by m, the observable 
oq/ -|- (a ■ cf) has the (dehnite) value 

oo -l- ||a|| if (m -|- n) • a > 0 

oo — ||a|| if (m -|- n) • a < 0. 

Mermin writes that elementary integration conhrms that, for any hxed 
state |n), the average over all fh of the values asigned to an observable 
QqI + {a ■ a) agrees with the result Qq + {a ■ n) predicted by quantum 
mechanics. In fact, the required integration is so elementary that it 
was done by Archimedes. All one needs is the theorem that, when a 
sphere is cut by a plane, its area is divided in the same ratio as the 
length of the diameter perpendicular to the plane. To verify that the 
average over fh of the values of a^I + {a-a) in the state \h) is oq -I- (a• n ), 
we begin with a couple of simplifications. First, we may assume that 
Oo = 0, because a general oq would just be added to both sides of the 
equation that we are trying to prove. Second, thanks to the rotational 
symmetry of the situation (where any rotation is applied to all three of 
a, fi and m), we may assume that the vector a points in the 2 ;-direction. 
Finally, by scaling, we may assume that a = (0, 0,1). So our task is to 
prove that the average over fh of the values assigned to (Jz is n^. By 
dehnition, the value assigned to az is ±1, where the sign is chosen to 
agree with that of ruz + Uz- In view of how fh is chosen, this ruz + Uz 
is the ^-coordinate of a random point on the unit sphere centered at 
h. So the question reduces to determining what fraction of this sphere 
lies above the x-y plane. This plane cuts this unit sphere horizontally 
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at a level below the sphere’s center. So, by Archimedes’s theorem, 
it divides the sphere’s area in the ratio of 1 + (above the plane) to 
1—77.2 (below the plane). That is, the value assigned to is +1 with 
probability (1 + nz)/2 and —1 with probability (1 — nz)/2. Thus, the 
average value of is n^, as required. 

This hidden-variable theory can be viewed in the framework of Sec¬ 
tion 01 Each of the vectors fh + n corresponds to a value map, namely 
the map sending any observable GqI + {a, ■ a) to the value described 
above. It is not difficult to verify that this is indeed a value map, 
because there are so few commuting observables for our 2-dimensional 
"H. Two observables commute if and only if their a’s are parallel or 
antiparallel. That is, they differ by only a scalar factor on the a-a part 
and an arbitrary change of the oq/ part. 

The mere existence of a value map (let alone a good probability 
distribution on value maps for all the states) shows that, in Theorem IT^ 
the hypothesis of dimension > 3 cannot be weakened so as to allow 
dimension 2. 

What happens if we try to fit this hidden-variable theory into the 
framework of Section 0]? A natural choice for A is the space of all the 
value maps obtained above, or, more geometrically, the space of their 
parametrizations m + n. Since both rh and n are unit vectors, A will be 
the ball of radius 2 centered at the origin of For any pure state \n), 
the associated probability distribution T(|T)(-y|) is the uniform distri¬ 
bution on the two-dimensional surface of a unit sphere centered at n, 
because we are choosing rh randomly while n is fixed. Notice that the 
framework of Dehnition 0] does not handle this situation well, because 
these probability distributions are not absolutely continuous with re¬ 
spect to any natural probability distribution on A. (What a physicist 
might call the probability density on A associated to a state is not a 
function but a distribution.) So we work instead with the framework of 
Ferrie, Morris, and Emerson [8], as summarized in Definition [TOl above 
or with the more liberal Definition [11] 

Both of these dehnitions require a convex-linear map T from the set 
7+1 of density matrices (representing mixed states) to the set Ad+i of 
probability measures on A. The hidden-variable theory under consid¬ 
eration has, so far, provided measures only for the pure states, i.e., 
the density matrices of the special form |n)(n|; to such a density ma¬ 
trix, it associated the uniform measure on the unit sphere surface 
centered at n. To obtain a probability representation, in either the 
Ferrie-Morris-Emerson version or our version, we must extend this map 
convex-linearly to all density matrices. 
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No such extension exists. Here is an example showing what goes 
wrong. Consider the four pure states corresponding to spin in the 
directions of the positive x, negative x, positive z and negative z axes. 
The corresponding density operators are the projections 

I + (7 X I - (7 X I + (7 Z I -(7 Z 

2 ’ 2 ’ 2 ’ 2 ’ 

respectively. Averaging the hrst two with equal weights, we get av¬ 
eraging the last two gives the same result. So a convex-linear extension 
T would have to assign to the density operator the average of the 
probability measures assigned to the pure states with spins in the ±x 
directions and also the average of the probability measures assigned to 
pure states with spins in the ±z directions. But these two averages 
are visibly very different. The hrst is concentrated on the union of two 
unit spheres tangent to the |/-z-plane at the origin, while the second is 
concentrated on the union of two unit spheres tangent to the x-|/-plane 
at the origin. 

Thus, Bell’s example of a hidden-variable theory for 2-dimensional "H 
does not ht the assumptions in any of the expectation no-go theorems. 
It does not, therefore, clash with the fact that those theorems, unlike 
the value no-go theorems, apply in the 2-dimensional case. 

Another way to view this situation is as a demonstration that the 
hypothesis of convex-linearity cannot be omitted from the expectation 
no-go theorems. In comparison with De£nition[TO], which described the 
hypotheses used by Ferrie, Morris, and Emerson [H], our Dehnition [m 
dropped the requirement of convex-linearity for effects; Bell’s example 
shows that we cannot also drop that requirement for states. 

In view of the idea of symmetry or even-handedness suggested by 
Spekkens [16] , one might ask whether there is a dual version of Theo¬ 
rem [131 that is, a version that requires convex-linearity for effects but 
looks only at pure states and does not require any convex-linearity for 
states. 

The answer is no; with such requirements there is a trivial example 
of a successful hidden-variable theory, regardless of the dimension of 
the Hilbert space, so there cannot be a no-go theorem. The example 
can be concisely described as taking the quantum state itself as the 
“hidden” variable. In more detail, let A be the set of all states, i.e., 
the projective space obtained from the set of unit vectors of "H by 
identifying any two that differ only by a phase factor. Let T assign to 
each pure state \'ip){'ip\ the probability measure on A concentrated at 
the point A|^) that corresponds to the vector |-^). Let S assign to each 
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effect E the function on A defined by 

S(E)(A|*)) = (■^\E\^). 

We have trivially arranged for this to give the correct expectation for 
any effect E and any pure state |'0). The formula for S{E) is clearly 
convex-linear (in fact, linear) as a function of E. Of course, T cannot 
be extended convex-linearly to mixed states, so that Theorem [12] does 
not apply. 


Appendix A. Convex-Linearity 

As we pointed out, near the end of Section l3Tl Spekkens [16] erro¬ 
neously claims that, if a function / is convex-linear on a convex set S 
of operators that span the space of Hermitian operators (and / takes 
the value zero on the zero operator if the latter is in iS), then / can be 
uniquely extended to a linear function on this space. 

The correct version of the result extends / not to a linear function 
but to translated-linear function, i.e., a composition of translations and 
a linear function. The rest of this section is devoted to a proof of this 
fact, in its natural level of generality. It applies to arbitrary real vector 
spaces; that the space consists of Hermitian operators is irrelevant. 

The convex hull, Conv(S'), of a subset S' of a real vector space 

V consists of the convex combinations aiVi anVn of vectors 

vi,... ,Vn G S where Oi a„ = 1 and every a* > 0. The affine 

hull, Aff(S), of S consists of the affine combinations aiVi anVn 

of vectors vi,... ,Vn ^ S where oi a„ = 1 but some coefficients 

Qi may be negative. 

A set is convex if it contains all the convex combinations of its mem¬ 
bers; similarly, it is an affine space if it contains all the affine combi¬ 
nations of its members. An easy computation shows that convex hulls 
are convex and affine hulls are affine spaces; that is Conv(Conv(S')) = 
Conv(^) and Aff(Aff(^)) = Aff(^). 

An affine space A in a vector space V is said to be parallel to a linear 
subspace L oiV ii A = uq + L = {uq + v : v ^ L} for some uq G V. It 
is easy to see that, if an affine space A is parallel to a linear space L as 
above, then (i) L is unique, (ii) Uq G A, (iii) any vector in A can play 
the role of the translator uq, and (iv) A is either equal to L or disjoint 
from L. 

Lemma 25 (§1 in [H]). Any affine subspace A of a real vector space 

V is parallel to a linear subspace L ofV. 

In other words, any affine subspace is a translation of a linear sub¬ 
space. For example, in we have that Aff{(0,1), (1, 0)} is parallel 
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to the diagonal y = —x, and Aff{( 0 , 1 ), ( 1 , 0 ), ( 1 , 1 )} is (and thus is 
parallel to) 

Proof. If A contains the zero vector 0 then it is a linear subspace. 
Indeed, if w G A then any multiple av = an + (1 — a)0 G A. And if 
u,v e A then u + v = 2(|m + |n) G A. 

For the general case, let Uq be any vector in the affine space A. It 
suffices to show that L = {v — Uq : v E A} is an affine space, because 
then the preceding paragraph shows that it is a linear space, and clearly 
A = Uq + L. Any affine combination ai(ni — Uq) + • • • + a„(n„ — Uq) 
of vectors in L (so the n, are in A and the sum of the a* is 1) can be 
rewritten as (aifi + • • • + a„n„) — uq, which is in L. □ 

Let V and W be real vector spaces, S a subset of 1/, C = Conv(S') its 
convex hull, and A = Aff(S') its affine hull. Recall that a transformation 
/ : C* —)■ VF is convex-linear on S if 

f{aiVi H-h ttnVn) = aif{vi) H-h a„/(un) 

for any convex combination aiVi -!-•••-[- anVn of vectors Vi from S. 
A transformation / : A —)■ VF is translated-linear if it has the form 
f{v) =4^0 + h{v — Uq) for some tco G hF, some uq G A, and some linear 
function h : L ^ W dehned on the linear space L = A — uq parallel to 

A. 

Proposition 26. With notation as above, any transformation f : C ^ 
W that is convex-linear on S has a unique extension to a translated- 
linear function on A. 

Proof. Notice first that translations v ^ v — uq and linear functions 
both preserve affine combinations. A translated-linear function, be¬ 
ing the composition of two translations and a linear function, there¬ 
fore also preserves affine combinations. This observation implies the 
uniqueness part of the proposition. Indeed, every element of A is an 
affine combination oiSi a„s„ of elements of S', and therefore any 

translated-linear extension of / must map it to ai/(si) -I— ■ + anf{sn). 

To prove the existence part of the proposition, it will be useful to 
work with the graphs of functions. For any function g ■. S ^ W with 
S' C 1 /, its graph is the subset oiV ®W consisting of the pairs (s, g{s)) 
for s G S'@ We record for future reference that the graph of is a linear 
subspace of R © fF if and only if the domain of is a linear subspace 
of V and 5 ^ is a linear transformation from that domain to W. We 

®In set-theoretic foundations, a function is usually defined as a set of ordered 
pairs, and so g is the same thing as its graph. 
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also note that the projection ■. V ®W ^ V : (n, tc) eA n is a linear 
transformation that sends the graph of any g to the domain of g. 

In the situation of the proposition, let / : C —)■ hh be a transfor¬ 
mation that is convex-linear on S', and let F C 1/ 0 hh be its graph. 
Also, let F~ be the graph of the restriction of / to S. Notice that the 
convex-linearity of / on S' means exactly that F is the convex hull of 
F~. It follows that F and F~ have the same affine hull, because 

Aff(F) = Aff(Conv(F-)) C Aff(Aff(F-)) = Aff(F-) C Aff(F). 

We claim that this affine hull Aff(F“) is the graph of a function; that 
is, it does not contain two distinct elements (n,tc) and {v,w’) with the 
same hrst component v. To see this, suppose we had two such elements 
in Aff(F) = Aff(F“), say 

(n, w) = ai(si, /(si)) H-h am{sm, f{Sm)) 

and 

(V, W') = /(ti)) H-h f{tn)), 

where all the Sj’s and t/s are in S and where 

(2) Oi 0 ■ • • 0 Qm = 0 ■ ■ ■ 0 

because both sides are equal to 1. So we have 

(3) OiSi 0 • ■ • 0 CimSm = 0 ' ' ' 0 

because both sides are equal to v, and we want to prove w = w', i.e., 

(4) ai/('Sl) 0 • • • 0 Clmf{Sm) = bif{ti) 0 ■ ■ ■ 0 bnf{tn)- 

In the special case where all coefficients Oj and bj are > 0, vector v is in 
C and both sides of (jl]) are equal to f{v). The general case reduces to 
this special case as follows. In all three equations ©-(jl]), move every 
summand with a negative coefficient to the other side, and then divide 
the resulting equations by the left part of the rearranged equation ©. 
As a result we return to the special case already treated. Since the old 
version of (|1|) follows from the new one, this completes the proof of our 
claim that Aff(F) = Aff(F“) is the graph of a function. 

By Lemma 1231 the affine space Aff(F) is parallel to a linear subspace 
H oi V ® W, say Aff(F) = {uo,Wo) 0 H, where Uq G V and Wq G 
W. From the fact that Aff(F) is the graph of a function, it follows 
immediately that H is also the graph of a function. Indeed, if H 
contains (u,tc) and (u,tc'), then Aff(F) contains {v — uo,w — wq) and 
(u — uq, w' — tco), so w — Wq = w' — Wq and w = w'. 

Let h be the function whose graph is H. Because iL is a linear 
subspace oi V ®W, we know that h is a linear transformation from 
some linear subspace L oiV into W. 
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The fact that {uo,wo) + H = Aff(F) tells us, by applying the linear 
projection vr : 1/ © IT —)■ 1/, that Uq + L equals 

7r(Aff(F)) = Aff(7r(F)) = Aff(C') = A, 

where the hrst equality comes from linearity of vr and the second from 
the fact that F is the graph of the function / whose domain is C. So A 
is parallel to the linear subspace L of V. Furthermore, for each n G C, 
we have 

(v, f(v)) e F C AS(F) = (uo, Wo) + H, 

so (v — Uo, f(v) — Wo) is in the graph H of h. That is, h{v — uo) = 
f{v) — Wo and so f{v) = tco + h{v — Uo). Thus, the translated-linear 
function n t—)■ tco + h{v — Uo) is the desired extension of /. □ 

Remark 27. A linear function h on a subspace L of a vector space V 
can be extended to a linear function h on all of V. Extend any basis 
of L to a basis of V, dehne h arbitrarily on the new basis vectors that 
are not in L, and extend the resulting function by linearity to all of V. 

For transformations dehned on all of T, we have a simpler formula 
for translated-linear functions, because 

Wo + h(v - Uo) = Wo + h(v) - h(uo) = h(v) + wi, 

where Wi = Wo — h(uo). 

On the other hand, in contrast to Proposition [26|, this h is not unique 
(unless L = V). 

Also, in the case of inhnite-dimensional spaces, the extension process 
requires the axiom of choice (to extend bases) and need not be well- 
behaved with respect to natural topologies on the vector spaces. 

Appendix B. No-Go Theorem for Spekkens Version 

This appendix is devoted to proving the following no-go theorem for 
the original Spekkens version of probability representations, subject to 
the clarihcations discussed in Section 13.11 

Theorem 28. For a Hilbert space H of dimension at least two, there 
is no probability representation (Spekkens version) subject to determi¬ 
nateness and convex-linearity. 

Proof. In view of Proposition [23l it suffices to prove the theorem under 
the assumption that H has dimension exactly two. 

To begin, we recall the form of density operators and effects in a 
two-dimensional Hilbert space H. A basis for the Hermitian operators 
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on "H is given by the identity and the three Pauli matrices 


I = 



- 

\ 1 0 / ’ 




It will be convenient to use vector notation, denoting the triple of 
matrices (cTj,, ciy, by a. Then the general Hermitian matrix looks 
like 

wl + xa^ + ycTy + za^ = wl + x ■ a, 

where w and the three components of x are real numbers. The eigen¬ 
values of this Hermitian matrix are 


w 


± \/x"^ + = w ± 


\x\ 


In particular, the trace of this matrix is 2w^ and the matrix is positive 
if and only if tc > ||x||. 

Density matrices are the Hermitian, positive matrices of trace 1, so 
they have the form 

P = p{x) = ^(/ + x • a), 

where ||T|| < 1. As indicated by the notation, we parametrize these 
density matrices by three-component vectors x of norm < 1. The three- 
dimensional ball that serves as the parameter space here is called the 
Bloch sphere (with its interior). 

Similarly, effects have the form 


E = E{m,p) = ml + pa^ + qCy + ra^ = ml + p- a 

with 

11^1 < m < 1 - 11:^1 

(because E and I — E are positive operators) and therefore ||p|| < 
The parameter space here, consisting of all four-component vectors 
satisfying these inequalities, is a double cone over a three-dimensional 
ball of radius 

We record for future reference the traces 


Tr(/) = 2, Tr((Ta,) = Tr(ay) = = 0 

and the multiplication table 

^y^z ^z^y ^^x^ ^ z^ x ^x^z '^^yi 

and 

From these facts, it is easy to compute that 

TT{p{x)E{m,p)) = m + X ■ f), 
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where the factor | in the definition of p{x) has cancelled the factor 2 
arising from Tr(J). 

Given this backgronnd information, we are ready to prove Theo¬ 
rem [23 Suppose, toward a contradiction, that we have a probabil¬ 
ity representation (Spekkens version) satisfying determinateness and 
convex-linearity, for a two-dimensional "H. In view of Proposition |2S1 
we know that 

/4p(x)(A) = X ■ ^(A) -|- C'(A) 

and 

iE{m,p) = P- B{X) + mD{X) + F{X) 

for some nine functions A(A), -Bi(A), G(A), -D(A), F{X) where the index 
i ranges from 1 to 3. (The “translated” part of “translated-linear” 
accounts for C and F.) 

The dehnition of probability representation (Spekkens version) leads 
to some simplihcations. -E(0, 0) is the zero operator, whose associated 
^ function is required to be identically zero. That gives us F{X) = 0 
for all A, so we can simply omit F from the formula for 

Also, T'(1,0) is the identity operator, whose associated ^ function is 
required to be identically 1. That gives us D{X) = 1 for all A. So we 
can simplify the ^ formula above to read 

iE{m,p) =p- B{X) +m. 

Next, consider the requirement that 

TT{p{x)E{m,p)) = j ^E{m,p)Pp{x) dX. 

We already evaluated the trace on the left side of this equation at the 
end of the preceding section. The integral on the right side is 

j[{p- B{X))ix- A{X)) + {p-B{X))CiX)+m{x-A{X))+mC{X)]dX. 

Comparing the trace and the integral, and equating coefficients of the 
various monomials in m, p, and x, we hnd that 


(5) 

j B,{X)Aj{X) dX 

(6) 

j Bi{X)C{X) dX 


(7) 


Ai(A) dX 


0 , and 


(8) 


C'(A) dX = 1. 
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Next, we extract as much information as we can from the assumption 
that all the functions fip and are nonnegative. 

In the case of ^e, this means that, as long as ||p|| < m, 1 — m (so 
that E{m,p) is an effect), we must have m + p ■ B{X) > 0 for all A. 
Temporarily consider a fixed A and a fixed m G [0, |]. To get the 
most information out of the inequality m + p- B{X) > 0, we choose 
the “worst” vector p, i.e., we make p ■ B{X) as negative as possible, 
by choosing p in the opposite direction to B{X) and with the largest 
permitted magnitude, namely m. That is, we take 

P = -J^5(A) 

\\B{X)\\ 

so that our inequality becomes 0 < m(l — ||i?(A)|l), and therefore 

||5(A)||<1 for all A. 

Repeating the exercise for m G [|, 1] gives no new information. 

So we turn to the case of /ip(^), for which the nonnegativity require¬ 
ment reads 

f •1(A) + C'(A) > 0. 

For each fixed A, we consider the “worst” x, namely a vector x in the 
direction opposite to ^(A) and with the maximum allowed magnitude, 
namely 1. So we take 

^(A) 

P(A)|| 

and obtain the inequality 0 < —||y4(A|| -|- C(A). Thus, we have 
||A(A)|| < C(A) for all A. 


In particular, C(A) is everywhere nonnegative. 

A trivial consequence of ||A(A)|| < C(A) is that |Ai(A)| < C(A). 
Similarly, a trivial consequence of ||i?(A)|| < 1 is |i?i(A)| < 1. Putting 
this information into the i = j = 1 case of equation ([5]), and also using 
dHj), we find that 


1 


j Ri(A)Ai(A) dX 


< 


Ri(A)|-|Ai(A)|dA< 


1 •C'(A)dA = 1. 


So both of the inequalities here must be equalities. In particular, 
|i?i(A)| = 1 for almost all A except where C'(A) = 0. 

Similarly, we get that, for almost all A except where C'(A) = 0, we 
also have |i? 2 (A)| = |i? 3 (A)| = 1 and therefore ||.B(A)|| = y^. Since 
we also know ||R(A)|| < 1, we must conclude that C'(A) = 0 almost 
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everywhere. But that contradicts equation ([H]), and so the proof of the 
no-go theorem is complete. □ 
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